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Foreword 



The evolution of the Internet has led us to the new era of the information 
infrastructure. As the information systems operating on the Internet are getting larger 
and more complicated, it is clear that the traditional approaches based on centralized 
mechanisms are no longer meaningful. One typical example can be found in the 
recent growing interest in a P2P (peer-to-peer) computing paradigm. It is quite 
different from the Web-based client-server systems, which adopt essentially 
centralized management mechanisms. The P2P computing environment has the 
potential to overcome bottlenecks in Web computing paradigm, but it introduces 
another difficulty, a scalability problem in terms of information found, if we use a 
brute-force flooding mechanism. 

As such, conventional information systems have been designed in a centralized 
fashion. As the Internet is deployed on a world scale, however, the information 
systems have been growing, and it becomes more and more difficult to ensure fault- 
free operation. This has long been a fundamental research topic in the field. A 
complex information system is becoming more than we can manage. For these 
reasons, there has recently been a significant increase in interest in biologically 
inspired approaches to designing future information systems that can be managed 
efficiently and correctly. 

Essential for tackling the scalability problem is the introduction of modularity into 
the system. This requires defining the global goal, designing the activity of the local 
small entities, defining the interactions among the entities, and achieving the 
emergence of robust global behavior. The global goal is not the sum of the local 
goals, but more than that. Inspiration from biology, such as the concept of stigmergy 
(i.e., indirect communication via modifications of the environment), is particularly 
useful in the design of information systems that can adapt to unexpected 
environmental changes without preprogrammed system behavior. The development of 
biologically inspired information systems is steadily advancing, but it does not yet 
solve the above problems. 

This book addresses how biological inspiration can help solving problems like this 
one, as well as many others in various domains in information technology. It presents 
the current state of the art of the field, and covers various important aspects of 
biologically inspired information systems. 

Chapter 1 contains several papers related to the advancement of our understanding 
of biological systems for evolving information systems. Signorini et al. propose a case 
study where a familiar but very complex and intrinsically woven biocomputing 
system -the blood clotting cascade- is specified using a method from software design 
known as object-oriented design (OOD). Then, Shimizu et al. present an analysis of 
responses of complex bionetworks to changes in environmental conditions. 
Kashiwagi et al. discuss experimental molecular evolution showing flexibility of 
fitness, leading to coexistence and diversification in the biological system. Next, 
Mayer et al. introduce a method for adapting the recurrent layer dynamics of an echo- 
state network (ESN) without attempting to train the weights directly. They show that 
self-prediction may improve the performance of an ESN when performing signal 
mappings in the presence of additive noise. Also, Wang et al. describe a nonlinear 
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model for the rate of gene transcription, by employing a genetic algorithm to evolve 
the structure of a Bayesian network. Their methodology features a reconstruction 
resolution that is limited by data noise. Lastly, Johansson et al. first review the 
structure of the cerebral cortex to find out its number of neurons and synapses and its 
modular structure. The organization of these neurons is then studied and mapped onto 
the framework of an artificial neural network (ANN), showing that it is possible to 
simulate the mouse cortex today on a cluster computer, but not in real time. 

In Chap. 2, Zhou et al. present an emotion-based hierarchical reinforcement 
learning (HRL) algorithm for environments with multiple sources of reward. The 
architecture of the algorithm is inspired by the neurobiology of the brain and 
particularly by those areas responsible for emotions, decision making and behavior 
execution, namely the amygdala, the orbito-frontal cortex and the basal ganglia, 
respectively. Then, Bouchard et al. present two ways in which dynamic self-assembly 
can be used to perform computation, via stochastic protein networks and self- 
assembling software. They describe their protein-emulating agent-based simulation 
infrastructure, which is used for both types of computations, and the few agent 
properties sufficient for dynamic self-assembly. Next, Yoshimoto et al. design a 
system to support the dynamic formation of human relationships between strangers in 
networked virtual space. The system attempts to locate the most suitable virtual space 
for the content of their conversation. Graves et al. apply long short-term memory 
(LSTM) recurrent neural networks (RNNs) to more realistic problems, such as the 
recognition of spoken digits. Without any modification of the underlying algorithm, 
they achieve results comparable to state-of-the-art hidden Markov model (HMM)- 
based recognizers on both the TIDIGITS and TI46 speech corpora. Then, Sharlin et 
al. discuss tangible user interfaces (TUIs) and their potential impact on cognitive 
assessment and cognitive training, and present Cognitive Cubes as an applied test bed. 
Lee et al. discuss the principle of the artificial immune system and propose a virus 
detection system that can detect unknown viruses. Yu et al. discover that a model 
based on lateral disinhibition in biological retinas allows us to explain subtle 
brightness-contrast illusions. Finally, Choe et al. investigate how artificial or natural 
agents can autonomously gain understanding of their own internal (sensory) state, 
and, in the context of a simple biologically motivated sensorimotor agent, they 
propose a new learning criterion based on the maintenance of sensory invariance. 

In Chap. 3, Rubin et al. describe the response properties of a compact low-power 
analog circuit that implements a model of a leaky integrate-and-fire neuron, with 
spike-frequency adaptation, refractory period and voltage-threshold-modulation 
properties. Next, based on the nine construction rules of the so-called Tom Thumb 
algorithm. Mange et al. show that cellular division leads to a novel self-replicating 
loop endowed with universal construction and computation. Another self-property is 
presented by Petraglio et al., who describe an approach to the implementation on an 
electronic substrate of a process analogous to the cellular division of biological 
organisms. Finally, Upegui et al. present a functional model of a spiking neuron, 
where some features of biological spiking neurons are abstracted, while preserving 
the functionality of the network, in order to define an architecture with low 
implementation cost in field programmable gate arrays (FPGAs). 

In Chap. 4, first, Kondo et al. introduce an adaptive state recruitment strategy that 
enables a learning robot to rearrange its state space conveniently according to the task 
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complexity and the progress of the learning. Next, Josh! et al. use simple linear 
readouts from generic neural microcircuit models for learning to generate and control 
the basic movements of robots. Then, Labella et al. show that a simple adaptation 
mechanism, inspired by ants’ behavior and based only on information locally 
available to each robot, is effective in increasing the group efficiency. The same 
adaptation mechanism is also responsible for self-organized task allocation in the 
group. Finally, Vardy et al. present a detailed account of the processing that occurs 
within a biologically inspired model for visual homing. They also describe a cellular 
vision matrix that implements CGSM and illustrate how this matrix obeys cellular 
vision. 

Chapter 5 contains the papers related to distributed/parallel processing systems. 
Tsuchiya et al. develop an adaptive mechanism with the aim of enhancing the 
resiliency of epidemic algorithms to perturbations, such as node failures. It 
dynamically adjusts the fan-out, the number of receiver partners each node selects, to 
changes in the environment. Gruau et al. present an implementation of the blob object 
using the “programmable matter’’ platform of cellular automaton simulation. Then 
they describe an implementation of blob division, the machine implementation of 
computer node duplication. Buchli et al. propose a distributed central pattern 
generator model for robotics applications based on phase-sensitivity analysis. Finally, 
Izumi et al. consider an ant-based approach to the agent traversal problem, and 
propose a novel lightweight implementation of the ant system where the unnecessary 
traffic of the network is reduced. 

Chapter 6 treats networking-related issues. Dicke et al. look at the ability of an ant 
colony optimization algorithm to evolve new architectures for storage area networks, 
in contrast to the traditional algorithmic techniques for automatically determining 
fabric requirements, network topologies, and flow routes. Then, Sasabe at al. propose 
a new algorithm that considers the balance between supply and demand for media 
streams, inspired by biological systems, in P2P (peer-to-peer) networks. Le Boudec et 
al. investigate the use of an artificial immune system (AIS) to detect node 
misbehavior in a mobile ad hoc network using DSR (dynamic source routing), which 
is inspired by the natural immune system of vertebrates. Another network control 
method inspired by biology is presented by Wakamiya et al. where they adopt a pulse- 
coupled oscillator model based on biological mutual synchronization such as that used 
by flashing fireflies for realizing scalable and robust data fusion in sensor networks. 

Chapter 7 is devoted to the application of bio-inspired approaches to image 
processing. Seiffert et al. develop a new image compression method based on 
artificial neural networks (ANN), and apply it to biomedical high-throughput 
screening (HTS). Next, Avello et al. develop naive algorithms for key-phrase 
extraction and text summarization from a single document, which is inspired by the 
protein biosynthesis process. The last paper of this section is by Lee et al., who 
propose a biologically motivated trainable selective attention model based on an 
adaptive resonance theory network that can inhibit an unwanted salient area and only 
focus on an interesting area in a static natural scene. 

Chapter 8 contains several important topics. Oltean proposes an evolutionary 
approach for solving a problem for which random search is better than another 
standard evolutionary algorithm. Next, Kurihara analyzes a simple adaptive model of 
competition called the Minority Game, which is used in analyzing competitive 
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phenomena in markets, and suggests that the core elements responsible for forming 
self-organization are: (i) the rules place a good constraint on each agent's behavior, 
and (ii) there is a rule that leads to indirect coordination. Finally, Sorensen explores 
the genealogy of the contemporary biological influence on science, design and culture 
in general to determine the merits of the tendency and lessons to learn, and argues that 
biomimetics rests on bona fide scientific and technical grounds in the pursuit of 
dynamic IT, but also on other, more external factors. 

June 2004 Auke Jan Ijspeert 

Masayuki Murata 
Naoki Wakamiya 




Preface 



This state-of-the-art survey reports advancements of information technologies 
inspired by biological systems. It was comprehensively discussed for the first time 
during the 1st International Workshop on Biologically Inspired Approaches to 
Advanced Information Technology (Bio-ADIT 2004) held at EPFL, Switzerland, on 
January 29-30, 2004. This book continues the discussion of bio-inspired approaches 
for future information technologies, as perceived by the authors, including the 
achievements that were originated by the authors. To ensure the content quality, each 
paper was revised after the workshop took place, according to the helpful comments 
made by the reviewers, and fruitful discussions during the workshop. 

Probably, mimicking biological systems should not be an aim per se for engineers, 
and one can see that many useful engineering developments have little to do with 
biology (the use of metals, wheels, or rockets come to mind). However, nature offers 
us a large array of fascinating phenomena such as replication, self-organization, self- 
repair, and adaptive behavior that are clearly of tremendous importance to current and 
future IT technologies. Actually, the contributions published in this book underline 
the international importance of this field of research; 37 contributions are contained 
from Australia, Austria, Belgium, Canada, France, Germany, Japan, Korea, Romania, 
Spain, Sweden, Switzerland, Taiwan, UK, and USA. It strongly indicates the 
importance of the field and the world-wide movement into this field. 

The book consists of papers ranging from basic biological research to the 
application of biologically inspired algorithms and concepts to various aspects of 
information systems. The first chapter aims at providing a roadmap for the whole 
book and serves as a summary of the book's content. The book then moves to the 
analysis of biological systems for IT evolution. The following sections cover the 
application of biological inspiration to software, hardware, robotics, distributed/ 
parallel processing, network, and image processing systems. 

We wish to record our appreciation of the efforts of all the authors who helped to 
make this book happen, while we regret that it was not possible to include all the 
individual contributions. We are indebted to Daniel Mange and Shojiro Nishio, 
General Co-chairs, for managing the workshop. We would also like to again 
acknowledge the financial support from three sponsors for the original workshop; the 
Osaka University Forum, the Swiss Federal Institute of Technology, Lausanne, and 
the 21st Century Center of Excellence Program of the Ministry of Education, Culture, 
Sports, Science and Technology (MEXT) of Japan under the program title “Opening 
up New Information Technologies for Building a Networked Symbiosis 
Environment.” 



June 2004 



Auke Jan Ijspeert 
Masayuki Murata 
Naoki Wakamiya 
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Object-Oriented Specification of Complex 
Bio-computing Processes: A Case Study of a Network of 
Proteolytic Enzymes 



Jacqueline Signorini and Patrick Greussay 



Artificial Intelligence Laboratory 
Universite Paris-8, 93526 Saint-Denis - France 
{sign,pg}@ai ,univ-paris8 . fr 



Abstract. We propose a case study where a familiar but very complex and in- 
trinsically woven bio-computing system - the blood clotting cascade - is speci- 
fied using methods from software design known as object-oriented design 
(OOD). The specifications involve definition and inheritance of classes and 
methods and use design techniques from the most widely used OOD-language: 
the Unified Modeling Language (UML), as well as its Real-Time-UML exten- 
sion.First, we emphasize the needs for a unified methodology to specify com- 
plex enough biological and biochemical processes. Then, using the blood clot- 
ting cascade as a example, we define the class diagrams which exhibit the static 
structure of procoagulant factors of proenzyme-enzyme conversions, and fi- 
nally we give a dynamic model involving events, collaboration, synchroniza- 
tion and sequencing. We thus show that OOD can be used in fields very much 
beyond software design, gives the benefit of unified and sharable descriptions 
and, as a side effect, automatic generation of code templates for simulation 
software. 



1 Introduction 

OOD for Non-software Complex Systems: 

Object-oriented programming (OOP) is now the major paradigm for software design. 
To specify and design very complex programs, OOP requires new methods recently 
built upon description languages. Prior to programming, these languages yield a com- 
plete specification for the states and processes which constitute the planned task. One 
other reason of the success of these description languages (UML, Real-Time UML) 
[1], [7], [6] has to do with the power of sharing a prototype description between inde- 
pendent teams working in loose cooperation. 

It is now a standard issue that OOP gave birth to a large set of object oriented design 
(OOD) methods. This structured set of methods has become a recognized craft as well 
as a technical methodology. The methods give specifications kept separate from the 
target programs which will be subsequently hand-coded or computer-generated from 
the description yielded by OOD. 



A.J. Ijspeert et al. (Eds.): BioADIT 2004, LNCS 3141, pp. 1-12, 2004. 
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The time has come to consider that OOD has all the power needed to specify com- 
plex domains beyond program design. They could not rely until now on: 1/ a unified 
and stable description methodology; 2/ a way of using concepts such as encapsula- 
tion, classes sharing and embedding, static properties and dynamic process inheri- 
tance; 3/ an opportunity to share and recombine descriptions built by separate re- 
search teams; 4/ the ability of automatically generating the code templates for simu- 
lation software associated with the specified situations. 

We thus propose, using a typical example in bio-computing - the set of processes of 
the blood clotting cascade - a case study which hopefully demonstrate the need for a 
unified description methodology. For that purpose, our tool will be the most widely 
used and standardized language today for OOD: Unified Modeling Language 
(UML)i. 

At this stage of development of biochemical models, we believe that it is necessary to 
use standard and stable object-oriented specification tools in contrast with non- 
standard tools such as Statecharts or LCS [13]: first, it will thus be possible to com- 
bine several specifications coming from unrelated research teams; second, we 
strongly believe that we need a large number of case studies in biochemical modeling 
before even thinking of trying to use the language of mathematics or formal seman- 
tics; (even after more than thirty years of software design research, formal methods 
are still very far from accepted as standard), last it is probably too soon to try an inte- 
grated multilevel description of a complex organism, even with reactive systems. 
Biochemical processes are not formal objects, and we are at the very beginning of 
trying to build approximate computational models. What we need is a deeper under- 
standing of what biological components need to be adequately described; proof meth- 
ods, at this stage, would not be even mildly interesting. However, it is very possible 
that unexpected or unknown bio-processes could be exposed, as a side-effect of mod- 
eling. The last word, obviously, will come from biological validation. 



What Is the Blood Clotting Cascade: 

The process of blood clotting and the subsequent dissolution of the clot following 
repair of the injured tissue is termed hemostasis. It is based on an ordered series of 
proenzyme-enzyme conversions, also referred as the proteolytic cascade [5]. Proteo- 
lytic enzymes are proteins that can cut other proteins in pieces. As they can be ex- 
tremely dangerous, they usually are formed and transported in the plasma as proen- 
zymes (zymogens), an inactive form which on activation undergoes proteolytic cleav- 
age to release the active factor from the precursor molecule. The coagulation pathway 
functions as a series of positive and negative feedback loops which controls the acti- 
vation process. It ensures the formation of the cross-linked fibrin clot that plug in- 
jured vessels and prevent blood loss through the action of thrombin. All the plasma 
proteins involved in coagulation, mainly produced in the liver, are clotting factors 
designated by Roman numeral descriptors. They reflect the order of their discovery 
rather than their sequence in the clotting cascade. Factors XII, XI, X, IX, Vll, pro- 
thrombin are proenzymes which are converted to active enzymes during coagulation. 



' We use Rational Rose 2000 UML, Rational Software Corp. 
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In contrast, factors V and VIII are cofactors. When a proenzyme is activated, an “a” is 
added to the number. 

Hemostatis is composed of four major events, ordered, following the loss of vascular 
integrity [8]. We briefly describe them: 

1/ The initial phase or contact phase is vascular constriction. It involves the interac- 
tion of blood cells, platelets, and a set of proteins of blood plasma. When blood 
comes in contact with disrupted vascular vessels, platelets adhere to exposed collagen 
of the subendothelial tissues, release the contents of their granules and aggregate. 
Platelet aggregation ensues to form the primary hemostatic plug to temporarily arrest 
blood loss. Then, the ultimate goal of the proteolytic cascade is to produce thrombin 
which will convert soluble fibrinogen into fibrin to entrap the initial plug. The gen- 
eration of thrombin can be divided into three following phases: the intrinsic pathway 
and the extrinsic pathway that provide alternative routes for the generation of factor X 
(Stuart-Prower factor) and the final common pathway which results in thrombin acti- 
vation. 

2/ The contact phase is enhanced in the intrinsic pathway. It requires the clotting 
factors XII, IX, XI, X and VIII, the proteins prekallikrein and high-molecular-weight 
kininogen (HMWK) as well as calcium ions (CA-H-) and phospholipids (PL) secreted 
by platelets. Negatively charged surfaces as collagen fibres activate factor XII which, 
then, triggers clotting via the sequential activation of factors IX, XI and X. 

3/ The main function of the extrinsic pathway (also called Tissue factor pathway) is to 
augment the activity of the intrinsic pathway. It provides a very rapid response to 
tissue injury. The two main components are Tissue factor (factor III) and factor VII as 
well as CA-h- and PL. 

4/ The intrinsic and extrinsic pathways converge at the factor X to a single common 
pathway which is ultimately responsible for the production of thrombin (factor Ila). 
Thrombin, in turn, converts fibrinogen to fibrin which then polymerizes to form the 
fibrin clot. After hemostasis is established, fibrin undergoes fibrinolysis, a process 
that prevents excessive fibrin deposit in the vasculature and dissolves the clot in order 
for normal blood flow to resume following tissue repair. The dissolution of the clot 
occurs through the action of plasmin [10], [12]. 

Why using UML to design such a system? First, it provides a structured analysis and 
object orientation approach that actually fits this information system where objects 
are specific entities strongly interconnected. Second, the proteolytic cascade which 
directs the proenzyme-enzyme conversions is actually mapped on a state diagram 
where the behavior of objects is identified when entering, exiting or existing in a state 
as well as the events accepted in that state. Third, by activity and sequence diagrams 
timing tools are introduced as synchronization bars, branch, contact or merge points 
enabling to sketch the clotting mechanism by message passing and method triggering. 
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2 OOD Blood Clotting: Classes 



In our model, four classes are defined to fit the specific hemostatic stages. Each class 
is a set of objects and objects (the corresponding blood factors, carrier proteins, co- 
factors) are instances of a class. 

A class is here represented as a three-segment box. The top segment has the class 
name, the middle segment contains the list of attributes and the bottom segment con- 
tains the list of operations or methods. Objects collaborate and exchange messages 
through the relationships defined between classes and objects. There are five ele- 
mentary types of relationships: association, aggregation, composition, generalization 
and dependency [6]. Here, the four classes designing the complex clotting process 
(Fig. 1) share three kinds of relationships. 




Subendothelial 
T issue 




Fig. 1. The four classes of the clotting process 

The contact phase class (detailed in Fig. 2), corresponding to the very first moment of 
clot formation (exposure of blood plasma to collagen in a damaged vascular wall) is 
connected to the intrinsic pathway class through an aggregation link characterizing its 
logical and physical dependance to the latter one. The link is shown with diamonds at 
the owner end of the relationship. 
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The two following classes, intrinsic pathway and extrinsic pathway, are shown with 
closed arrowheads indicating a generalization or inheritance relationships with the 
common pathway subclass. In the UML, generalization means two things: inheritance 
and substitutability. Inheritance means that a subclass has all its parent’s attributes, 
operations, associations and dependencies. The subclass can extend and specialize the 
inherited properties. By specialization, the subclass may polymorphically redefine an 
operation to let be more appropriate to it and even substitute for the inherited behav- 
ior a new associated one [11]. 

This linking type definition agrees with the common pathway process responsible for 
the production of thrombin. First, through positive feedback loops, thrombin quickly 
magnifies the hemostatic activity of some procoagulant factors in the intrinsic and 
extrinsic pathways : factors XI, VIII, V, XIII, X, IX, I (fibrinogen). Then, by negative 
feedback loops, thrombin bound to thrombin inhibitors (antithrombin III, heparin 
cofactor II, a2-macroglobulin, al-antitrysin) activates plasma proteins which degrade 
or inhibit coagulation factors. This endogenous regulation limits the extent of the 
clotting cascade. 

The third relationship connecting the fibrinolytic pathway class to the common path- 
way class is a composition, a strong aggregation with bidirectional navigability. The 
fibrinolytic pathway class is a component class within the composite formed with the 
common pathway class. Components are included in the composite and only shared 
by the owner. The relationship is shown with a filled diamond at the composite end. 
Actually, in the clotting process, fibrinolysis is strongly dependent on the activation 
of thrombin, first step in the common pathway, which operates the proteolytic cleav- 
age of fibrinogen. 

Thrombin produces fibrin monomers which then polymerize to form fibrin strands 
and insoluble fibrin by activation of factor XIII, last enzymatic step in the coagulation 
process that introduces cross-links in fibrin polymers [10], [4]. 

Two actors, the stick figures, are drawn outside of the scope of the system and depict 
the interfaces that interact with the static design of the system. 

The numbers at the end of a relationship line denote the number of objects that patici- 
pate in the relation at each end. This is called the multiplicity role: an means car- 
dinality of multiple objects. The value 1 on the composite role designates the neces- 
sary role of a composite, solely responsible for the creation of part components. 



3 Contact Phase Objects 

Part of the intrinsic pathway, the contact phase clotting process is initiated by re- 
sponse to negatively charged materials of the subendothelial connective tissues. The 
central enzyme factor is factor XII (Hageman factor). 

In Fig. 2, six objects representing plasma factors are connected through association or 
aggregation relationships. Objects have both data and behaviors. Data given in the 
middle segment of the notation are attributes or properties that deifine the object, 
modify the object and upon them it acts. As in the class notation, the action list de- 
fines the set of possible operations or behaviors of an object. 
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Fig. 2. The contact phase class and objects 



In the object diagram, prekallikrein (Fletcher factor) is a zymogen of serine protease 
which circulates in the blood. It is involved in surface activation of factor XII, cata- 
lyzed to kallikrein by factor Xlla which further activated factor XII. It also partici- 
pates in the conversion of plasminogen to plasmin and kinin formation. 

Kallikrein is a plasma serine protease present as inactive prekallikrein. Kallikrein has 
direct chemotactic activity and has the capacity to cleave C5 to C5a. C5a (analphyla- 
toxin) causes release of granules containing vasoactive amines from mast cells. Kalli- 
krein acts on kininogens to produce bradykinin, a vasoactive amine that contracts 
smooth vessels. 

High Molecular Weight Kininogen (HMWK, Fitzgerald factor) circulates as a com- 
plex with kallikrein and factor XL It serves as a carrier protein for factor XII and 
factor XI maximizing their interactions. 

Activated factor XII catalyzed by HMWK and kallikrein produces more prekallikrein 
establishing a reciprocal activation cascade such that it can activate factor XI, the next 
step in the intrinsic pathway which to proceed efficiently requires CA-H-. It must be 
noted that tests in vitro [3] have shown that patients with factor XII, kallikrein or 
HMWK deficiencies do not tend to have bleeding disorder while factor XI deficiency 
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(second step) is required in hemostasis. As for the class diagram (Fig. 1), the relation- 
ships indicate conversion to serine protease by an aggregation link and catalyzation 
by an association link with directional navigability. 



4 Contact Phase State Diagram 

The identification of objects and their changes is concerned with object analysis 
identifying the essential set of objects and their relationships. The specification of 
dynamic behavior of those logical objects comes through object behavior designs as 
state diagrams. The state diagrams in Fig. 3 provide finite state machines of the con- 
tact phase model. 

Six states are connected by direct transitions. When an event occurs - the expose of 
negatively charges membranes to blood plasma - the object transitions to another 
object, or state, as a result of accepting the event or performing an action. Soon this 
new state is entered, an implicit timer regulates the exiting timout transition or ampli- 
fies it by self triggering. The transition itself can be characterized by parameters, 
guards or boolean expressions as well as action list. 

The most noticeable feature of the ULM state diagram is the nesting of states within 
states. As shown in Fig. 3b, the contact phase state diagram, the first step in platelet 
aggregation (see Sect. 3), is nested in the intrinsic pathway state diagram. 

The state diagram is a dynamic view of the entire state space of a system. In the re- 
maining part of this paper, we describe three more models for the clotting mecha- 
nism, each one focusing on a specific view of the entire state space and acting as 
constructive to fully define its complete dynamic behavioral design. 




Fig. 3a. The contact phase state diagram 
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Second step in the Intrinsic 
Pathway (Factor XI) 



Fig. 3b. Nested states 



5 Blood Clotting Collaboration Diagram 

The first model called collaboration diagram shows the structural organization of the 
objects that participate in a given set of messages. Here, it provides the underlying 
structural context of the clotting process, identifying the object relationships in the 
four classes previously defined. Early in analysis, the structural context is the use case 
diagram. Use cases are defined by collaborations of objects grossly identified with 
actors and the system. Later, analysis captures design details and decomposes the 
system into objects. Use case scenarios are refined adding new levels of comprehen- 
sion. Fig. 4 builds the clot formation scenario showing the set of objects working 
together. It provides an order-dependent view of how the clotting system is expected 
to behave. 

In the scenario, the extrinsic pathway cascade (right-hand side) is shown drawing an 
alternative route to factor Xa. When flowing blood is exposed at the site of injury. 
Tissue factor (factor III) is released. In the precense of calcium and phospholipids, 
Tissue factor and factor VII form the complex TF-VIIIa that catalyses activation of 
both factor X and factor IX. Factor Xa and cofactor V react with prothrombin to gen- 
erate thrombin. Then, the remainder of the cascade is similar to the intrinsic pathway. 
The TF-VIIa complex is rapidly inactivated by Tissue factor inhibitor (TFPI) [2]. 

The feedback loops starting from the thrombin notation box with reversed arrows 
(bottom-up) show the regulatory mechanism of the clotting cascade which serves two 
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functions: 1/ limit the amount of fibrin clot formed to avoid ischemia (blood shortage) 
of tissues; 2/ localize clot formation and prevent widespread thrombosis. 
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Fig. 4. The blood clotting collaboration diagram 



6 Extrinsic Pathway Activity Diagram 

The following diagram (Fig. 5) called activity diagram provides a more refined com- 
prehension of the structural organization of the extrinsic pathway as designed and 
shortly explained in the previous section. It shows the flows among activities associ- 
ated with the objects including transitions, branches, merges, forks and joins. It can 
be used to specify an algorithm or a detailed computation. 

In Fig. 5, four synchronization bars are drawn identified by long, black rectangles. 
Actions are executed in the same order than states but transitioning between states can 
be subject to concurrency. Actually, some states behave as and-states requiring syn- 
chronization in propagated events. As for thrombin (a fork bar) which sends stimuli 
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to states becoming active and-states, then, at some point, joins control (a join bar) 
from multiple source and-states to end its activity and prevent bleeding disorders. 
Hemophilia A, for example, is a X-linked disorder resulting from a deficiency in 
factor VIII [8] whom activation must occur via proteolytic cleavage by thrombin and 
factor Xa. 




Fig. 5. The extrinsic pathway activity diagram 



7 Sequence Clotting Diagram 

The collaboration diagram emphasized the relationships between the objects. The 
sequence diagram emphasized the sequence of messages sent between them. 

At the top of the diagram, rectangles give the object identity. Their names are under- 
lined to distinguish them from classes. The dashed vertical line is known as the life- 
line: the time axis of the diagram downwardly directed. The arrows between the life- 
lines represent messages being sent between the objects. The white rectangles are 
called activations and show the duration of a method execution in response to a mes- 
sage. Fig. 6 shows our comprehension of the sequence of messages and methods 
between objects. VII-TF complex designates the cofactors for the extrinsic pathway, 
V -X complex the cofactors for the common pathway and VIII-IX are cofactors for the 
intrinsic pathway. 
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The upper part of the diagram exhibits the activation of procoagulant factors. Starting 
by the activate_factorX() methods in the VII-TF complex, the event cascades trig- 
gering new methods in objects up to the point where changes in object activation 
promote blood clotting by positive feedback loops. The lower part of the diagram is 
driven by clot restriction messages through methods that inhibit procoagulant factors 
as well as activating plasmin. 

Obviously, detailed timing analysis should add time values to the diagram as con- 
straints or synchronization patterns. However, those time values can only be investi- 
gated through coagulation in vitro and are themselves elements of a comprehensive 
model. 
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Fig. 6. The blood clotting sequence diagram 



8 Conclusion 

Object-oriented design (OOD) methods are valuable techniques for planning and 
prototyping complex software but also for building conceptual domain models. In the 
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present paper, choosing a very complex bio-computing process as a case study - the 
blood clotting cascade [5] - we demonstrate the effective capability of object oriented 
methodology to specify domain expertise and to investigate through descriptive dia- 
grams the constructive views of the domain. 

We have used the core techniques of the standard Unified Modeling Language 
(UML) [1], [6] from OOD. Each one provides a specific description of the blood 
clotting process. Class diagrams give a structured organization of blood factors with 
their relationships. State and collaboration diagrams build the finite state machine of 
the process. Finally, with activity and sequence diagrams, we describe time-ordered 
proenzyme-enzyme activations which may support conditional and parallel behavior. 
This approach lends itself to obvious extensions; in a forthcoming paper, we will 
describe several other metabolic pathways, especially in the neoglucogenesis in liver 
and kidneys. 
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Abstract. The responses of flexible bionetworks to extreme environ- 
mental changes were studied in several microorganisms. Gene and 
metabolic networks markedly changed with osmotic pressure in Saccha- 
romyces cerevisiae. Even though large perturbations were observed in the 
gene network, the metabolic network changed moderately due to the lim- 
itation and constraints of redox and energy balances. A self organizing 
map (SOM) was a useful tool for analysis of massive amount of com- 
plex bionetwork data such as gene expression profiles. In the study on 
glutamate production by Corynebacterium glutamicum and Corynebac- 
terium efficiens, it was concluded that there exists the most responsible 
enzyme to control of flux redistribution at a key branch point. Flux of 
biodegradable-polymer production pathway of Paracoccus denitrificans 
was well controlled by manipulating feeding speed of carbon sources. The 
heterogeneity of the population of the Escherichia coli cells in a contin- 
uous culture was studied and it was suggested that cell-cell interaction 
is a also important factor controlling bionetwork. The characteristics of 
flexible and complex bionetworks should be related to the flexibility, ro- 
bustness, and tolerance of microorganisms under extreme environmental 
conditions. . . . 



1 Introduction 

Living organisms are complex systems with multidimensional hierarchical net- 
works, composed of genes, proteins, and metabolic networks, as shown in Fig. 
1. Living cells have the ability to flexibly change the topology of their complex 
bio-networks to survive under extreme environmental conditions. When cells en- 
counter extreme environmental conditions such as at high temperature, high 
osmosis, low oxygen concentration, nutrient depletion, the expression patterns 
of genes, proteins and metabolic networks markedly change and rearrange. It 
is considered that the ability of rearrangement of bionetworks and the complex 
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topology itself are closely associated with the flexibility and robustness of liv- 
ing organisms against extreme environmental conditions. The analysis of the 
features of these bionetworks is very important not only for elucidating how 
complex bionetworks are organized but also for generating next-generation in- 
formation technologies (ITs). The characteristics of bionetworks, i.e., diversity, 
scalability, robustness, and resilience, are expected to be involved in the de- 
velopment of next-generation ITs. The analysis of complex bionetworks and the 
determination of their significance should contribute to the development of novel 
bio-inspired information technologies. 



Hierarchical networks in microorganisms 




Fig. 1. Schematic of a complex bionetwork 



Many genome projects have been completed and information on the entire 
genome of many organisms is now available for analysis. Along with the recent 
and significant progress in experimental and analytical techniques for the expres- 
sions of genes (DNA arrays or DNA chips), proteins (two-dimensional (2D) elec- 
trophoresis and mass spectrometry(MS)), and metabolic networks (metabolme 
and metabolic flux distribution analysis(MFA)), it has been possible to ana- 
lyze information on complex bionetworks in living organisms. These techniques 
facilitate the comprehensive visualization of the profiles of bionetworks. 

A metabolic pathway is a sequence of biochemical reactions that comprise 
all living activities in cells. Namely, the uptake of nutrients from outside of cells, 
generation of energy and redox powers in catabolic pathways, construction of 
building blocks such as amino acids, nucleotides, fatty acids, and carbohydrates. 
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and polymerization of proteins, lipids, polysaccharides from building blocks in 
anabolic pathways. Since a set of flows (or fluxes) of all metabolic reactions 
are directly associated with physiology of cells, it is possible to understand the 
physiological states of cells by capturing the distribution of reactions in metabolic 
pathways. It is very important to understand biologically how cells change their 
metabolic flux distribution. 



1.1 Metabolic Engineering 

Metabolic engineering emerged in the 1990fs and was defined as a methodology 
for targeted improvement of product formation or cellular properties through the 
systematic analysis of metabolic pathways and the modification of specific bio- 
chemical reactions. The importance of analyzing the network rigidity in primary 
metabolic pathways [1] and recruiting heterologous activities, such as heterolo- 
gous enzymes and transport systems [2], has been emphasized. 

One of the great contributions of metabolic engineering to biology and 
biotechnology is the integration of the macroscopic analysis of entire bioprocesses 
such as fermentation and the microscopic analysis of intracellular metabolic reac- 
tions such as networks of enzyme reactions. A set of reaction rates in a metabolic 
pathway is called metabolic flux distribution, and cell physiology can be inter- 
preted and understood by metabolic flux distribution analysis (MFA) [3] under 
the investigated fermentation conditions. The impact of a genetically engineered 
strategy on the improvement of a metabolic pathway in a host microorganism 
can be also discussed by comparison of flux distribution between a wild-type 
microorganism and a recombinant microorganism, in which a specific gene is in- 
troduced, enhanced, or deleted genetically, called the metabolic control analysis 
(MCA) [4]. A strategy for improving metabolic pathways and for generating su- 
perior cells, namely, the development of a systematic algorithm for modification 
is the primary goal in metabolic engineering which uses many tools of bioinfor- 
matics, such as genome databases, DNA arrays, 2-D electrophoresis, and mass 
spectrometry [5], [6]. These analyses of response of complex bionetworks to the 
perturbation of artificial gene networks and environmental changes would clarify 
the characteristics of bionetworks. 

In this paper, the flexible responses of bionetworks to environmental changes 
are discussed using many examples: (i) responses of the gene and metabolic 
networks of the yeast Saccharomyces cerevisiae to osmotic pressure induced by 
NaCl addition, (ii) clustering analysis of the gene network of S. cerevisiae using 
a self-organizing map (SOM), (iii) metabolic flux redistribution of coryneform 
bacteria against biotin depletion and the addition of a type of detergent ma- 
terial, (iv) metabolic flux distribution in the biodegradable polymer, poly-(3- 
hydroxybutyrate-co-3-hydroxyvalerate), P(HB-co-HV), of Paracoccus denitrifi- 
cans, and (v) heterogeneity of a cell population in a continuous culture of Es- 
cherichia coli. 
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2 Bionetwork Analysis of S. cerevisiae Against Osmotic 
Pressnre of NaCl Addition 



The yeast Saccharomyces cerevisiae is widely used in fermentation and brewing 
processes. In this study, S. cerevisiae cells were exposed to stress induced by the 
increase in osmotic pressure, and the global gene expression profile and flux dis- 
tribution of the central-carbon metabolism were analyzed. Under high-osmotic- 
pressure conditions, we compared a laboratory strain (FY834) S. cerevisiae with 
an industrially used and osmotic-stress-tolerant brewing strain (IF02347) used 
for the production of Japanese sake (or rice wine). Both strains of S. cerevisiae 
were cultivated in a liquid medium. NaCl (1 or 0.5 M) was added to this medium 
5 hours after the cultivation was started. Gene expression was compared between 
two strains before and after the NaCl addition using DNA microarrays. Figure 
2 shows the scatter plots of the gene expressions of both strains when NaCl was 
added at a final concentration of 1 M. 



FY834 

15min 



30 min 




45 min 






30 min 



45 min 





Fig. 2. Time courses of scatter plots of gene expressions in laboratory strain (FY834) 
and brewing strain (IF02347) of S. cerevisiae when NaCl was added at a final concen- 
tration of 1 M. Horizontal and vertical axes are gene expressions before and after of 
NaCl addition. 



The gene expressions of both strains were markedly changed by the NaCl 
addition. The dispersion of the gene expression level of FY834 was greater than 
that of IF02347. The response of the gene expression to the 0.5 M NaCl stress 
was faster than that to the 1 M NaCl stress. Data of dynamics and grade of 
gene expression change should be studied for analysis of rearrangement of gene 
network with stress addition. 
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Figure 3 shows the response of genes encoding enzymes involved in central- 
carbon metabolism of strain FY834. The genes encoding the enzymes involved 
in acetate, glycerol and trehalose productions were up-regulated in both strains 
under the stress conditions. However, the genes encoding sodium ion exclusion 
pump and copper ion-binding proteins were more up-regulated in the brewing 
strain than in the laboratory strain. The up-regulation of these genes may be 
utilize to elucidate the osmotic stress tolerance of the brewing strain. 
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Fig. 3. Response of genes encoding enzymes involved in central carbon metabolism in 
FY834. Numbers in the figure indicate the ratio of gene expression level under a 1 M 
NaCl stress condition to that without NaCl addition (control). 



The flux distributions of the central-carbon metabolism were also analyzed 
in both strains cultivated under high-osmotic-stress conditions. Cells of both 
strains were cultivated in a synthetic medium in a 5L jar fermentor, and the 
concentrations of intracellular and extracelluar metabolites such as ethanol, ac- 
etate, glycerol, and trehalose, were measured during exposure to NaCl stress. 
The flux distributions under high-osmotic-stress conditions were determined us- 
ing a metabolic reaction model and the data of metabolites measurements. 

Figure 4 shows the metabolic flux distribution in FY834 under high osmotic 
pressure conditions (IM NaCl). For the laboratory strain, carbon fluxes for the 
production pathways for glycerol and acetate increased and those for the ethanol 
production and TCA cycle decreased 2 hours after the addition of 1 M NaCl. 
On the other hand, for the brewing strain, carbon flux to the glycerol produc- 
tion increased and that to ethanol production decreased, too. The carbon flux 
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to the acetate production decreased 2 hours after the NaCl addition. Glycerol 
and trehalose productions in a cell are enhanced for tolerance to high osmosis. 
In this study, it was confirmed that gene and metabolic networks respond to 
high osmotic environmental conditions. However, comparing the change in gene 
expression pattern with the change in metabolic flux distribution, it was found 
that there are much faster responses of genetic networks. Even though large per- 
turbations were observed in the gene network, the metabolic network changed 
moderately due to the limitation and constraints of redox and energy balances. 



Flux(Stress 6h, Stress 8h, Control 8h) 



Glc_ext 
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CeU 




Fig. 4. Metabolic flux distribution in FY834 under high-osmotic-pressure conditions 
(1 M NaCl) The numbers in the figure indicate the molar fluxes of metabolites induced 
by metabolic reactions. The mole flux is normalized by glucose consumption rate as 
percentage. 



3 Clustering Analysis of Gene Network by 
Self-Organizing Map (SOM) 

To analyze the interconnection of gene networks and to elucidate the role of each 
gene, the gene expression data is very useful because feedback and feedforward 
regulation mechanisms are involved such the networks. As described in the last 
section, gene expression markedly changed with environmental changes. Since a 
massive amount of data exists in DNA arrays, the primary goal of systematic 
expression analysis is to cluster expression data without knowledge of biological 
concepts such as molecular mechanisms. In this study, a SOM model was used 
for clustering [7]. To perform cluster analysis, biological and statistical indices 
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were utilized. The efficiency of the scheme proposed for analyzing gene networks 
was validated using a test dataset, in which the correct relationship of gene 
expression patterns is known as a priori. As a practical example, the previous 
study of Cho. et al. [8] published in the website http://genomics.stanford.edu 
was adopted. 

The SOM is one of the well-known unsupervised neural learning algorithms. 
Its concept of self-organization is based on competitive learning that is imple- 
mented in an unsupervised type of network. The dimension of the map in this 
paper is n by n neurons or neurons in total (i = 1 to n^), which means that 
of typical expression patterns will be extracted by clustering analysis with the 
SOM. The k-dimension of the vector was defined as the input vector Xj[k x 1], 
where j and k are the number of genes in the dataset (j =1, 2, 3,.., 1019) and the 
number of time points (fc = 16), respectively. Each element in the input layer, 
Xj is fully connected to neurons in the SOM layer. Neurons in the map of 
the SOM layer have vectors with the same dimension, 16, as that of the input 
vector Xj, called weight vectors Wj ( i = 1 to n^). 

At iteration t in the training, the distance d{Xj,Wi) is defined and used 
for the measure of the similarity between Xj and Wi. In this paper. Euclidean 
distance metric was used. The neuron closest to the input vector is selected as 
the winning neuron with the corresponding weight vector Wc, where c is the 
index of the winning neuron as shown in Eq. (1). When the winning neuron is 
selected, the weights of the neurons at iteration (t-l-1) with two parameters of 
Nc{i,t) and a are updated by Eq. (2). 

C{t) satisfies: || Xj(t) — Wc(t) ||= min\\ Xj{t) — Wi{t) || (1) 

W,{t+l) = W,{t) + a{t)N,{i,t)[X,{t) - W,{t)] for i < N,{i,t) (2) 

Wi{t -I- 1) = Wi{t) for all other indices of i (3) 

Where Nc{i,t) is a parameter that indicates neurons as neighbourhoods 
around the winning neuron. In this case, neurons belonging to A/,(t,t) are up- 
dated, while the other neurons are not as in Eq. (3). The parameter a{t) is the 
learning rate, which is a monotonically decreasing function of t and Nc(i, t) that 
denotes the neighbourhood size of the winning neuron c at iteration t. Gener- 
ally, the function a decreases nonlinearly from approximately unity to zero. The 
neighbourhood Nc{i,t) also decreases as the number of iterations increases. 

In this study, the size of the SOM was determined with statistical and bi- 
ological measures using the Akaike information criterion (AIC)[9], and some 
gene clusters related to cell cycles, Gl, G2, S and M phases were automati- 
cally extracted. The extracted genes coincided with genes previously reported 
as cell-cycle-dependent genes. Thousands of gene expression patterns clustered 
into a much smaller number of groups, namely, typical patterns extracted by 
comparing their similarities in terms of the gene expression data with the SOM, 
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as shown in Fig. 5. From this result it was concluded that the SOM is useful for 
analysis of complex bionetwork data, especially as a data compaction tool. 




time point 



Fig. 5. Clustering of gene expression patterns. Gene clusters related to cell cycles were 
automatically extracted. 



4 Flux Redistribution in Coryneform Bacteria in Amino 
Acid Production 

The coryneform bacterium Corynebacterium glutamicum isolated in the 1950fs 
by Japanese researchers. This microorganism can produce a large amount of 
amino acids such as glutamate and lysine. There are some triggers for glutamate 
overproduction: depletion of biotin required for cell growth, and the addition 
of a detergent material or lactum antibiotics such as penicillin. Marked change 
of metabolic pathways is induced by these triggering operations. However, it is 
not yet completely understood why the significant a change in metabolic path- 
ways is induced by these triggers. The analysis of the significant impact of a 
small specific triggers on metabolic networks is therefore necessary. In gluta- 
mate production, the quantitative analysis of the impact of the activity change 
in the enzyme around a key branch point, namely, 2-oxoglutarate, on the tar- 
get flux of glutamate was investigated by comparison of flux distribution be- 
tween a parent strain and genetically modified strains [10]. A metabolic reac- 
tion (MR) model was constructed for the central-carbon metabolism and gluta- 
mate synthesis pathway, and consistency of the model was statistically checked. 
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Metabolic flux distribution, particularly, flux distribution at the 2-oxoglutarate 
branch point was investigated in detail. The enzyme activities of isocitrate de- 
hydrogenase (ICDH), glutamate dehydrogenase (GDH), and the 2-oxoglutarate 
dehydrogenase complex (ODHC) at the branch point were changed using two 
genetically engineered strains under controlled environmental conditions. GDH 
and IGDH activities were enhanced by transformation using each plasmid, which 
contains homologous gdh and icd genes, respectively. The ODHG activity was 
attenuated under biotin-deficient conditions. 

Figure 6 shows the redistribution of fluxes around the 2-oxoglutarate branch 
point. It was found that the enhancement of IGDH and GDH activities does not 
significantly affect flux distribution at the 2-oxoglutarate branch point. Even 
though IGDH and GDH activities increased to 3.0 and 3.2, respectively, more 
than 70 percent of carbon flux still flows into the TGA cycle. On the other hand, 
when the ODHG activity decreased to approximately 50 percent after the biotin 
depletion, marked changes in the fluxes of GDH and ODHG were observed. More 
than 75 percent carbon was directed to glutamate production, which is the effect 
of the three enzymes on glutamate production. The most important issue here 
is that the most significant impact on glutamate production is given by ODHG 
attenuation. The second impact on glutamate production was observed during 
the increase in IGDH activity and increase in GDH activity was not effective for 
glutamate production. 

Gompletely the same flux redistribution was observed by the addition of 
detergent material in both C. glutamicum and another coryneform bacterium, 
Corynehacterium efficiens. In both strains, GDH and IGDH activities did not sig- 
nificantly change throughout the fermentation but ODHG activity significantly 
decreased due to biotin depletion. Flux redistribution obviously occurred after 
the decrease in ODHG activity. The difference in glutamate production between 
C. glutamicum and C. ejficiens is due to a result of the difference in the degree of 
decrease in ODHG activity. The differences of Michaelis-Menten constants, and 
Km values of IGDH, GDH, and ODHG explains the mechanism of flux redistri- 
bution at the a-KG branch point. It was found that the Km values of IGDH and 
ODHG were much lower than that of GDH in both strains. From these results, 
it was suggested that ODHG is more sensitivity to a change in 2-oxoglutarate 
concentration than GDH. ODHG is the most crucial factor for the control of flux 
distribution at the a KG key branch point in coryneform bacteria. 



5 Metabolic Flux Distribution in Biodegradable Polymer, 
Poly-(3-Hydroxybutyrate-co-3-Hydroxyvalerate), 
P(HB-co-HV), of Paracoccus denitrificans 

Biodegradable polymers polyhydroxyalkanoates (PHAs) are accumulated by 
many types of microorganism as reserves and are degraded in nature. Since the 
mechanical and thermal properties of the copolymer, poly(3-hydroxybutyrate- 
co-3-hydroxy valerate), (P(HB-co-HV)) are superior to those of a homopolymer 
PHB, the production and quality control of the copolymer of P(HB-co-HV) is a 
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Fig. 6. Flux distribution at 2-oxoglutarate (a-KG) branch point in glutamate produc- 
tion. Flux distribution of wild-type strain (a) was perturbed by enhancement of the 
icd gene expression (b), enhancement of the gdh gene expression (c), and attenuation 
of ODHC activity by biotin depletion. 



matter of concern. The change in flux distribution in metabolic pathway occurs 
drastically when nitrogen in the culture medium is depleted. Because the fluxes 
to polymer production from different carbon sources are very important and 
the molar fractions of monomer units in the copolymer determine the thermal 
and mechanical properties, the analysis and control of monomer units are very 
important. 

In this study, Paracoccus denitrificans was used and analyzed for P(HB-co- 
HV) production. Ethanol and n-pentanol were used as carbon sources. To control 
the molar fraction of 3HV units in the polymer, a simplified metabolic reaction 
(MR) model of the PHA synthesis pathway was developed, in which a nitrogen- 
deficient condition was assumed (e.g., molar ratio of carbon to nitrogen (C/N) is 
50). A simplified metabolic map is shown in Fig. 7. Since the nitrogen source was 
very limited, the specific growth rate became almost zero and it was assumed 
that anabolic pathways were neglected. In the simplified pathway, intermedi- 
ate metabolites except those at key branch points were removed because their 
concentrations must be constant in the steady state[ll]. 

For the polymerization of P(HB-co-HV), NADPH is necessary in the reaction 
involving 3-ketoacyl-CoA reductase. Thus, the regeneration of NADPH in the 
reaction involving isocitrate dehydrogenase in the TCA cycle should be included 
in the model. When the metabolic pathway for P(HB-co-HV) production and the 
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molar fraction of 3HV units in the polymer are considered, two branch points are 
significant: that of 3-ketoacyl-CoA and that of acetyl-CoA. The P(HB-co-HV) 
synthesis pathway from ethanol and n-pentanol has already shown [11]. 

From the analysis of the stoichiometory of metabolic reactions, the molar 
fraction of 3HV units is represented by the consumption rates of ethanol and n- 
pentanol as functions of flux ratio, m, at a branch point of 3-ketovaleryl-CoA. It 
was found that the flux ratio at another branch point, that of the 3-ketobutyryl- 
CoA, n, was constant due to the balance of consumption and regeneration of 
NADPH in the cell. From the ^^C-NMR analysis, flux ratio, m, at the branch 
point of 3-ketovaleryl-CoA was determined. Figure 8 shows the relationship be- 
tween the mole-based consumption rate of ethanol and n-pentanol, R, and the 
molar fraction of 3HV units. The relationship is linear and split ratio, m, was 
determined from the slope of the curve. With the determined value, 3HV molar 
fraction was controlled by manipulating speed of the addition of the two alcohols 
independently. 
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Fig. 7. Metabolic map of P(HB-co-HV) synthesis 



6 Heterogeneity of Cell Population in Continuous 
Culture of E. coli 

It was generally assumed that cells in the culture systems are homogeneous. How- 
ever, recent studies pointed out that a heterogeneous population may emerge and 
exist in the system. Moreover, at high cell densities, different groups of cells were 
appeared from the normally observed cell population at the exponential growth 
phase [12]. Cells should interact with each other through interface of their envi- 
ronment by sharing nutrients and extracellular metabolites. These phenomena 
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Fig. 8. Relationship between mole-based consumption rate of ethanol and n-pentaol, 
R, and molar fraction of 3HV units. Closed circles indicate experimental observation 
of the relationship between R and 3HV unit molar fraction. The cross indicates the 
target value of the 3HV unit (35 percent) and the open square indicates the control 
result. 



should be associated with the start of the development of symbiosis and integra- 
tion of different bionetoworks. 

In this study, a glutamine synthetase gene (glnA) deficient mutant of E.coli 
YMC21 was used as host. The plasmid pKGN-EGFP, carrying the glnA-egfp 
fusion gene was introduced into the determined ^ZnA-deficient mutant. Using 
the recombinant strain, the physiological state of the cell was determined. A 
minimal medium, containing glucose and glutamate as sole carbon and nitrogen 
sources, respectively, was used. Gontinuous culture facilitated to the set up of 
steady-state environmental conditions with the constant and same input and 
output feed rates. 

Gontinuous culture at several dilution rates and glutamate concentrations 
was performed. The concentrations of the cells, glucose and glutamate were mon- 
itored as macroscopic variables, and cell population was evaluated by cell mor- 
phological observation and the measurement of intensities of forward scattering 
(FS) and GFP fluorescence using a flow cytometer. Macroscopic variables were 
kept constant over twenty generations of the cells, that is, it was suggested that 
the environmental conditions and macroscopic states of the cells were evaluated 
in a steady state. However, from observation using a fluorescence microscope, 
the sizes of the cells and intensities GFP florescence were not homogeneous. 

Figure 9 shows the changes in GFP intensity and forward scattering intensity 
(FS) on the log scale determined by the flow cytometry when glutamate concen- 
tration in the feeding medium was changed. When glutamate concentration in 
the feeding medium decreased, cell concentration decreased accordingly. In this 
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transient state, specific growth rate decreased for 24 hours. During this phase, 
the cell population with a high GFP fluorescence intensity became the major 
group in the system, and the number of a cells with a low GFP fluorescence 
intensity decreased. On the other hand, when glutamate concentration in the 
feeding medium increased, cell concentration increased accordingly. In this tran- 
sient state, the specific growth rate increased for 24 hours. During the phase, the 
cell population which had a low GFP fluorescence intensity became the major 
group in the system, and the number of cells which had a high GFP fluorescence 
intensity decreased. Interestingly, the cell population with the high or low GFP 
fluorescence intensity was not maintained, and the original highly diverse of cell 
population emerged after reaching the steady state. By clarifying the role of each 
cluster or population of cells, the cell-cell interaction and interconnection will be 
analyzed in future experiments. 



Top figures: steady state 




Lower figures: transient state 



Fig. 9. Change in cell population as observed using flow cytometer. Glutamate con- 
centration in the feeding medium was changed. 



7 Analysis of Bionetworks for IT Evolntion 

In this paper, the responses of multihierarchical and complex bionetworks com- 
posed of genes, proteins, and metabolic networks are discussed. It was found that 
bionetworks flexibly rearrange under extreme environmental conditions such as 
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high osmotic pressure and nitrogen depletion. Not only a change in gene expres- 
sion pattern but also constraints in redox and energy balances are important in 
the rearrangement of metabolic networks and cell physiology. This network re- 
arrangement is influenced not only by environmental change but also by cell-cell 
interaction. 

Bionetworks and IT networks have similar topologies, they are not random 
but scale- free and growing networks [13]. Additionally, specifications to be im- 
plemented in next-generation ITs were surprisingly similar to those of bionet- 
works such as flexibility, diversity, adaptability, self-organization, and robust- 
ness. Among these characteristics, it was pointed out that one of the significant 
problem in the operation of large-scale and growing networks is the difficulty 
of modeling an the entire network. Thus, a novel operation strategy is highly 
desired, in which only local information is used for decision making without the 
need for completely modeling an entire network. Living organisms is a typically 
flexible and robust network that is organized by local rules only. 

In this sense, apart from the concept of the optimal conditions of a network or 
optimal effectiveness in a network with complete information on and knowledge 
of the network, stability with reasonable performance of the network should be 
considered. The autonomous mechanisms of bionetwork for recovery from a poor 
state should be studied. Variables for measuring activity of an entire network 
should also be studied. To attain these objectives, studies on flexible changes of 
topology and connections of bionetworks with changes in environmental condi- 
tions discussed in this paper will contribute to the development of a new ideas 
for next-generation ITs. 
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Abstract. Molecular evolution was observed through three serial cycles 
of consecutive random mutagenesis of glutamine synthetase gene and 
chemostat culture of transformed Escherichia coli cells containing the 
mutated genes. From an initial of diversified mutant population in each 
cycle, several varieties of mutants reached the state of coexistence. In 
addition, a once extinct mutant gene was found to have the capacity 
to coexist with a gene pool of later cycle of molecular evolution. These 
results including the kinetic characteristics of the purified wild-type and 
mutant glutamine synthetases in the phylogenetic tree revealed that the 
enzyme activity has diverged rather than optimized to a fittest value 
during the course of evolution. Here, we proposed that the flexibility 
of the fitness of a gene in consequence to cellular interaction via the 
environment is an essential mechanism governing molecular evolution.. . . 



1 Introduction 

Biological molecules such as an enzyme and its gene are products of natural evo- 
lution brought forth by many factors, each of which is impetus to the commonly 
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observed complex phenomena in the field of molecular evolution. In an effort 
to comprehend the mechanism of molecular evolution, experimental evolution 
in a simplified system encompassing the essence of natural evolution has gained 
its ground in the prospect of biological evolution (1, 2). Experimental evolution 
studies on Drosophilae (3) revealed that the population of several generations in 
a single environment maintained protein polymorphism on some genetic sites. 
In addition, stable coexistence of some mutant flies isolated from the popula- 
tion was observed (4), a phenomenon which seems not just a consequence of a 
balance between the supplement of accidental mutations and their consumption 
by selection and/or genetic drift, i.e., the standard interpretation for molecular 
polymorphism (5). 

Recently, some phenotypes from experimentally evolved Escherichia coli pop- 
ulations in a simple unstructured environment were reported to diverge, of which 
diversity was maintained (6). Though physiological features of these phenotypes 
were so similar that guarantee severe competition, the phenotypes did coexist 
within the environment. It was suggested that mutations occurring at different 
genetic sites during the experimental evolution developed the phenotypes to ac- 
quire complementary relationship for coexistence in the same environment (6). 
While a small number of mutations at different genetic sites are sufficient to 
allow coexistence, are mutations on multiple genetic sites really necessary for 
maintaining the diversity of a single gene during the evolution? 

In pursuance of the question, we have previously established a simple chemo- 
stat culture involving it E. coli strains differing only in glutamine synthetase 
gene. Under functional selection, the it E. coli strains achieved stable coexis- 
tence indifferent to mutations, if any, other than those on the glutamine syn- 
thetase gene (7, 8). In this paper, through three serial cycles of consecutive 
random mutagenesis of the gene and competition of its products in a simple un- 
structured environment of the chemostat as stated above, molecular evolution 
of glutamine synthetase gene was observed. Results suggest that the outcome of 
evolution tends not to be an existence of one fittest gene but of several closely 
related genes with appropriate fitness for coexistence. Here, we proposed that 
the essential mechanism of molecular evolution is the flexibility of the fitness of 
a gene towards cellular interaction via the environment. Changes in the fitness 
temporarily suppress the optimization process, allow the coexisting state, and 
direct diversification. 

2 Materials and Methods 

2.1 Materials 

E. coZz YMC21 {A{glnA-glnG)2000 AlacU169 endAl hsdR17 thi-1 supE44) lack- 
ing glutamine synthetase gene (9) was kindly provided by Dr. Boris Magasanik 
(Massachusetts Institute of Technology). A hybrid plasmid pKGN containing 
E. coli glutamine synthetase gene was prepared previously (10). pKP1500 (11) 
was a generous gift from Dr. Takeyoshi Miki (Kyushu University). Glutaminase 
was a generous gift from Daiwa Kasei Co. Ltd. (Osaka). DNA manipulation and 
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transformation of E. coli cells were carried out as described by Maniatis et al. 
(12). Mutation rate was determined based on the mutated gene harbored in the 
colonies isolated from the plate. 



2.2 Random Mutagenesis 

Error-prone PCR using ATth polymerase (13) and primers 5’GGGCCAGAAC- 
CTGAATTCTTCC3’ and 5’GTTTTGGGATAAAGGTGGGGGTT3’ was em- 
ployed for mutating G466 to G709 region of glutamine synthetase gene on plas- 
mid pKGN. PGR fragments were digested with EcoRV and BgRl and ligated to 
pKGN previously digested with the same restriction enzymes. E. coli YMG21 
was then transformed with the hybrid plasmids. An aliquot of 0.1 ml from a total 
of 0.51 ml transformation mixture was plated on an LB agar plate (12) contain- 
ing 50 /ig/ml ampicillin. Mutation rate was determined based on the mutated 
gene harbored in the colonies isolated from the plate. 



2.3 Pre-culture of Trausformauts 

A 0.2 ml aliquot of the transformation mixture (0.51 ml) was inoculated in 5 
ml of medium G (0.1 M L-glutamate, 4 g/1 glucose, 10.5 g/1 K2HPO4, 4.5 g/1 
KH2PO4, 50 mg/1 MgS04.7H20, 5 mg/1 thiamine HGl, and 50 mg/1 ampicillin), 
and the transformant mixture were grown at 37°G for 12 h. An aliquot of the 
12-h culture (OD600 x volume = 1.0) was then transferred into fresh medium 
(100 ml), and cultivated further for 48 h at 37° G before dividing the culture into 
two equal volume (40 ml) and centrifuged. Gells collected were used as inocula 
for two concurrent runs of chemostat culture. 



2.4 Chemostat Culture 

Culture conditions were the same as those described previously (7, 8) except 
for the glutamate concentration and the value of dilution rate, which were fixed 
at 0.1 M (medium C) and 0.075/h, respectively. Glutamate serves as the sole 
nitrogen source. 



2.5 Characterizatiou of Mutaut Euzymes 

Genes encoding mutant enzymes to be characterized were subcloned to modified 
pKP1500 having an additional it Xbal site, it E. coli YMC21 cells transformed 
with the hybrid plasmids were cultivated at 37°C on medium C. The enzymes 
produced were purified as reported previously (10) except that a Q Sepharose 
Fast Flow column was used instead of a DEAE-Sepharose CL-6B column. Glu- 
tamine synthetase activity of the purified enzymes was measured as described 
previously (10). 
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2.6 Assay of Glutamine Concentration in Chemostat Culture 

A 65 ml of culture effluent was collected, centrifuged, and the supernatant was 
passed through consecutively on three molecular weight cut-off filters, 100,000, 
30,000, and 5,000 (Millipore-UFC4THK, UFC4LTK, and UFC4LCC, respec- 
tively) to remove contaminating biopolymers. Glutamine in solution was sepa- 
rated from glutamate by charging the filtrate onto an AGl-X8-acetate column 
(Bio-Rad), which adsorbs glutamate. The pass-through fractions were collected, 
concentrated, and the volume of the concentrated filtrate measured. Glutamine 
concentration was measured either by enzymatic method using glutaminase and 
glutamate dehydrogenase (14) or by chromatographic method as follows: Glu- 
taminase was added to a 2-ml aliquot of the concentrated filtrate, and the reac- 
tion mixture was incubated at 37°G for 120 min. The reaction mixture was then 
loaded onto an AGl-X8-acetate column, and the adsorbed glutamate was eluted 
with 1 M acetic acid. The amount of glutamate in the eluate was measured by 
the PIGO-TAG method (15). Glutamate concentration in another 2-ml aliquot 
of the concentrated filtrate without the glutaminase treatment was measured 
as well. The glutamine concentration in the chemostat culture was estimated 
from the difference of two glutamate concentrations measured with and without 
glutaminase treatment. 

3 Results and Discussion 

3.1 Experimental Molecular Evolution of Glutamine Synthetase 
Gene 

A mixture of transformants harboring glutamine synthetase genes diversified by 
in vitro random mutagenesis were subjected to continuous chemostat cultivation 
with ensured well-stirred environment to avoid any spatial bias in the culture 
of approximately 10^° cells. Glutamine synthetase genes were extracted from all 
the cells in the chemostat at the end of the first cycle of molecular evolution 
(270 h), pooled, and subjected to another in vitro random mutagenesis. For the 
second cycle of molecular evolution, the consecutive processes were the same as 
stated above except that cultivation in the chemostat was extended to 552 h. 
Genes from the end of the second chemostat culture were once again subjected to 
in vitro random mutagenesis as the preparatory step for the third chemostat run 
for 552 h. Mutation rate due to the in vitro random mutagenesis, estimated from 
the nucleotide sequences of the genes isolated from randomly chosen 58 clones 
after each mutagenesis, was 0.77 substitutions per gene with a high preference 
on A to G and T to G mutations as reported previously (13, 16). Among the 
55 substitutions detected, 24 were synonymous and 31 were non-synonymous 
substitutions. 

Molecular phytogeny and population dynamics were deduced from the nu- 
cleotide sequences of the glutamine synthetase genes isolated from each of the 
three chemostat runs. Twenty clones were randomly chosen at 270 h of the first 
chemostat culture, while 50 clones each were likewise chosen at 270 and 552 h 
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of the second, and 552 h of the third culture (Fig. 1). From a diversified first 
mutant population, only the wild-type (Wl) and two mutant genes (Al and 
W2) were found, with Al gene being major at the end of the first chemostat 
run. The other genes present during the initial stage may have been extinct or 
decreased in frequency below the sampling probability (5%). Final population of 
the second cycle of molecular evolution showed the presence of Wl and Al genes, 
however, dominating the population was a new mutant Cl gene, which seems to 
be a derivative of the Wl. A decrease in the variety of the mutant populations 
during the chemostat cultures were evidently shown when the second chemostat 
run was extended to 552 h (Fig. 1). Only 3 types of genes were detected at 552 
h out of the 9 varieties found at 270 h, Al, A2, Bl, Cl, Wl, W2, W3, W4 and 
W5. The A2 gene seems to be derived from A2; W3, Bl, and Cl, from Wl; and 
W4 and W5, from W2. The frequency of the six mutants not detected at 552 h 
may have decreased in a level below the detection limit (2%). At the end of the 
third chemostat run, the Al gene regain its standing as the major gene in the 
population with the existence of Cl in addition to the 7 other new mutant genes 
found (Fig. 1). The A3 appears to be a derivative of Al, while C2, C3, Dl, El, 
FI, and Cl are of Cl derivative. 

Reproducibility of the results was attested by sequence analysis of the glu- 
tamine synthetase genes at 270 h from randomly chosen clones in duplicate 
runs of the first chemostat culture. Both final population structures comprised 
a majority of the Al gene with the Wl gene as one of the minority (see Fig. 
1). Though genes of minority differ in each run, the chance of picking different 
genes with low frequency is feasible on random sampling. Hence, the resulting 
population structure was reproducible and not incidental. 

Here, we conclude that the glutamine synthetase gene has evolved under 
the selection pressure imposed solely on the gene itself, as based on the facts 
and results of the experimental molecular evolution described above, (i) The 
glutamine synthetase gene is critical for the survival of the host strain, E. coli 
YMC21 deficient of the gene, under the culture conditions employed (7). (ii) The 
only difference among the transformants in the chemostat lies on their glutamine 
synthetase gene carried on the plasmid, (iii) The presence of selection pressure 
on the gene was evidently shown by the elimination of many of the varieties of 
genes present at the initial stage of each chemostat run. (iv) While a total of 
24 synonimous and 31 non-synonimous substitutions were found from randomly 
chosen clones in the initial mutant population of the three chemostat cultures, 
a total of 15 synonimous and 8 non-synonimous were detected at the end of the 
three cultures (Fig. 1). High frequency of synonimous mutations (P < 0.001) 
inferred the existence of the selection pressure on the gene, (v) With the large 
population (10^*^) used in observing the molecular evolution, it is unlikely that 
the perceived changes in the gene frequency (Fig. 1) were attributed solely to 
genetic drift (17). For instance, the gene frequency of the Cl gene in the second 
chemostat culture increased from 70% at 270 h to 88% on 552h. Average rate 
of frequency change per generation is more than 10“^ order [(0.88 - 0.70)/(552 
- 270) X (0.075), where 0.075 represents the generation per hour based on the 
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6% 6% 88% 




Fig. 1. Molecular evolution of glutamate synthetase. Phylogenetic tree of glutamine 
synthetase gene. W1 denotes the wild- type gene. Paired capital letter and numerical 
number represents the type of mutant gene. Each letter denotes a different deduced 
amino acid sequence, and genes with the same letter but different number are syn- 
onymous mutants. The number of synonymous/non-synonymous mutations appears 
in parentheses. Double-headed vertical arrows represent the time span of cultivation. 
The values of percentage written under the paired capital letter and numerical number 
represent the frequency of each strain 



dilution rate. This value is much bigger than the expected 10“® order if genetic 
drift is assumed for the frequency change in such large population size, (vi) 
Although genomic mutation may occur during the chemostat culture even for 
the Rec~ host strain, the effects of genomic mutation on the dynamics of the 
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molecular evolution of the understudied gene were confirmed to be very small 
by the reproducibility of the experimental results. 



3.2 Mechanism of Coexistence as Means of Diversification 

Clearly, as described above, different types of glutamine synthetase gene coex- 
isted under a selective environment regardless of the long span of each evolution 
cycle. Moreover, effect of mutations, if any, on genetic sites other than the glu- 
tamine synthetase gene was very small and hence, not necessary for maintaining 
the diversity of the gene during the evolution. In our previous work, closely re- 
lated competitors of E. coli strains differing only in the glutamine synthetase 
gene were shown to reach a stable coexistence (7, 8). Consecutively, a mathe- 
matical model revealed that the essential mechanisms of coexistence involve the 
interplay between the internal metabolic network of competitors and the exter- 
nal variables of the environment, thereby allowing competitors to reach the same 
growth rate and coexist in the same environment (18). 

In view of the interplay between the internal metabolic network of the com- 
petitors and the environment, glutamine, a product of the glutamine synthetase 
reaction, may be one of the important external variables. To evaluate the role of 
glutamine, competition experiments between the Al and W2 genes found at the 
end of the first chemostat run was performed (Fig. 1). Duplicate runs under the 
same conditions as in the molecular evolution resulted in the same population 
structure although the transient behavior is different (Fig. 2). 

The results indicate that the observed coexisting state is not incidental but 
may be governed by an underlying mechanism as presented in the mathematical 
model (18). Indeed, glutamine was detected at an average value of 1.5 fjM from 
the sampled culture at indicated time of each chemostat runs (arrows in Fig. 2). 
As glutamine was not added in the fresh medium fed to the chemostat, there 
could be a leakage of glutamine from the dead or live cells of both or either 
of the strains with the Al or W2 gene. When glutaminase was added in the 
feeding medium to hydrolyze glutamine leaked in the medium, coexistence of 
the strains no longer prevails and the W2 gene was excluded (Fig. 2). These 
results indicate that glutamine is one of the important factors for the observed 
coexistence. Hence, cellular interaction among competitors through their major 
nutrients such as glutamine allows the maintenance of mutant genes within a 
population. Cross-feeding of a nutrient has been demonstrated to be a possible 
mechanism for the maintenance of coexistence of E. coli strains with different 
genotypes in a simple environment (19, 20). Here, we showed that through cel- 
lular interaction mediated by the nutrients, competitors differing even only in a 
single gene were capable of reaching a state of coexistence. The mechanisms un- 
derlying the coexistence of even closely related competitors in a minimal model 
system can be the provision for bio-diversification occurring in nature. 
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Time (h) 

Fig. 2. Competition between the W2 and A1 genes. Each of the plasmid containing 
W2 gene or A1 gene of the same concentration was used to transform fresh host cells 
independently. The total resultant transformants from each transformation were mixed 
and an aliquot was used as inoculant for the chemostat run as described under Ma- 
terials and Methods. Frequency of A1 was determined as follows: an aliqout of the 
chemostat culture was sampled at the indicated time and plated on 2xTY agar plate 
(12) containing 50/rg/ml ampicillin. The plasmids from randomly picked 26 clones per 
sampling were analyzed by double restriction endonuclease digestion. W2 has only the 
Pstl site, while A1 has both the Pstl and BsfEII sites, (o) and (A), parallel runs, A and 
B, of the competition experiments; (•), glutaminase (final 0.47 unit/ml) was added in 
the feeding medium. Glutamine concentration in the culture medium were measured at 
the indicated time (*) either by the PICO-TAG (for the A run) or enzymatic method 
(for the B run) as described under Materials and Methods. At 336h of the A run, 1.2 
/rM glutamine was detected in the culture, while at 221h, 268h, 336h, and 410h of the 
B run, glutamine concentrations in the chemostat culture were 0.8 pM, 1.9 /rM, 2.2/iM, 
and 1.3 /rM, respectively 



3.3 Diversification and Not Optimization in Molecular Evolution 

The experimental molecular evolution showed an exclusion of much of the va- 
rieties of closely related genes produced by random mutagenesis through the 
selection pressure imposed on the gene, but yet some remained and reached the 
state of coexistence. Competitors could have adjusted their fitness by means of 
the interplay between the internal states of the competitors and the environment. 
Hence, the environmental conditions as well as the internal state of a competitor 
vary depending on the population structure. Therefore, when a gene type exists 
with different groups of competitors, its fitness differs in each case in accordance 
to the change in the environmental conditions. In fact, the A1 gene dominant in 
the first cycle of molecular evolution was replaced by the Cl gene on the second 
cycle, and was back to its dominant status on the third cycle (Fig. 1). Could 
it then be that only gene types with higher fitness have been selected during 
the continuous progression of molecular evolution? To clarify the issue, the W2 
gene, which was extinct or may had decreased its frequency below the detection 
limit (2%) during the second chemostat run, was subjected to the following com- 
petition experiments. The W2 gene was mixed with the pooled genes isolated 
from the final population of the third chemostat run before introducing to fresh 
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host cells, and the total resultant transformants were cultivated in the chemo- 
stat under the same conditions used for the molecular evolution experiments. 
Interestingly, the W2 gene was able to coexist with the new population struc- 
ture regardless of the initial frequency of the W2 gene (Fig. 3). If the selection 
in the evolutionary system is based on gene fitness determined by its sequence 
and culture conditions and is independent of the presence of competitors, one 
should expect that genes pooled from the final population of the third chemostat 
have higher fitness than those of the second chemostat. In such a case, W2 gene 
could have no chance to survive when mixed with the population of the third 
chemostat. However, the results indicate that improvement on gene fitness is not 
always underway in the event of evolution. Although it may be natural for genes 
with higher fitness to be selected in a competition in a chemostat, the emergence 
of new genes by mutation during the evolution process may set forth a change 
in the environment of the selected genes due to new cellular interaction. As a 
consequence, fitness of the genes changes. Accordingly, the fitness of the genes 
selected by an evolution cycle is not necessarily higher than that selected by the 
previous cycle. In addition, in most of the evolution cycles, several genes had 
reached the state of coexistence. Therefore, the outcome of the evolution tends 
not to be an existence of one fittest gene but of several genes with appropriate 
fitness for coexistence. 

As cellular interaction among competitors temporarily relief optimization 
process in evolution, increase in gene fitness may be possible if interactions are 
weakened during competition, for instance, in low population density condition. 
When glutamate supplemented in the feeding medium was of very low concen- 
tration (10 /rM), not only did the total population density decreased from 10® 
cells/ml to 10® cells/ml, the competing population also resulted to the dominance 
of the W2 gene (Fig. 3, filled circle). Fig. 3 thus suggests that low population 
density promotes fitness increase while high population density drove diversifi- 
cation during evolution. 

Although the fitness of the glutamine synthetase gene was shown not opti- 
mized in the process of evolution, the function of the gene product, i.e., glutamine 
synthetase activity, may have done so. The wild-type and mutant enzymes in the 
phylogenetic tree in Fig. 1 were purified, and the kinetic property was analyzed 
with different glutamate concentrations. Results showed that enzymes coexist- 
ing in a population have different kcat and kcat/Km values (Fig. 1 and Table 1). 
Saturation theory assumes that natural selection would cease to operate only 
if a function has reached a level that has no additional benefits to the fitness 
of a strain in a population (21). Generally, it is presumed that a strain with 
mutant gene expressing a catalytic activity closer to the fittest value will dom- 
inate a heterogeneous population. Therefore, as the population propagates in 
several generations, the average enzyme activity over the population is expected 
to show a dynamics with a convergence towards the value of the fittest activity 
either on a monotonous increase or decrease. On the contrary, as shown in Fig. 
4, fluctuation was observed in the average value of glutamine synhtetase activity. 
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Fig. 3. Competition of the W2 gene with members of the final population of the third 
chemostat run. The once extinct W2 gene was mixed with the pooled genes isolated 
from the final population of the third chemostat run (see Fig. 1). Frequency of the 
W2 gene was adjusted to 0.8 (o), 0.5 (o), and 0.1 (A) before mixing with the pooled 
genes. The three gene mixtures were introduced independently to fresh host cells and 
each of total resultant transformants were cultivated in three chemostat runs 
and A) initiated with corresponding initial frequency of the W2 gene. Filled circle (•) 
represents competition experiment conducted in the same manner as described above 
with W2 gene frequency and glutamate concentration in the feeding medium reduced 
from 0.1 M to 10 pM. Frequency of W2 in the culture was determined as in Fig. 2 



Table 1. Kinetic constants of the wild- type and mutant glutamine synthetases in the 
phylogenetic tree of Fig. 1. Glutamine synthetase activity was measured as described 
previously (10) with varying L-glutamate concentration from 1 mM to 100 mM. The 
specific activity of a mutant enzyme encoded by the D1 gene was less than the lower 
detection limit (10“^ s“^) 



type of enzyme K, 


.n (mM) kcot(s ^) 


Km /kcat (mM ^) 


W 


12.1 


40.3 


3.3 


A 


7.5 


20.7 


2.8 


B 


1.1 


29.6 


26.1 


C 


3.3 


28.9 


8.9 


E 


23.2 


48.0 


2.1 


F 


49.4 


6.4 


0.13 


G 


9.5 


6.1 


0.64 



Accordingly, glutamine synthetase activity has diverged during the course of 
evolution rather than optimized to a fittest value. We therefore proposed that 
an essential mechanism governing molecular evolution is the flexibility of the 
fitness of a gene towards cellular interaction via the environment. The changes 
in the fitness of genes temporarily suppress the optimization process, allow the 
coexisting state, and direct diversification. The molecular diversification found 
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Fig. 4. Average values of the kinetic constants over the population at indicated time of 
the three chemostat runs. The average values were calculated from the data shown in 
Fig. 1 and Table 1. Filled triangle represents the value of kcat and open circle represents 
the value of kcat /Km 



in the experimental evolution so far described may serve as a basis in pursuance 
on issue regarding bio-diversification and speciation. 
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Abstract. Prediction occurs in many biological nervous systems e.g. in 
the cortex [7]. We introduce a method of adapting the recurrent layer 
dynamics of an echo-state network (ESN) without attempting to train 
the weights directly. Initially a network is generated that fulfils the echo 
state - liquid state condition. A second network is then trained to pre- 
dict the next internal state of the system. In simulation, the prediction 
of this module is then mixed with the actual activation of the internal 
state neurons, to produce dynamics that a partially driven by the net- 
work model, rather than the input data. The mixture is determined by 
a parameter a. The target function be produced by the network was 
sin®(0.24i), given an input function sm(0.24t). White noise was added 
to the input signal at 15% of the amplitude of the signal. Preliminary 
results indicate that self prediction may improve performance of an ESN 
when performing signal mappings in the presence of additive noise. 



1 Introduction 

Recurrent Neural Networks (RNNs) are remarkably versatile sequence processing 
devices. In principle, they are capable of performing a wide range of information 
processing tasks on temporally distributed input, by capturing the input history 
via recurrent connections. However, although RNNs possess very interesting the- 
oretical properties, their efficacy for learning long connected sequences remains 
questionable [10]. Attempts to learn long sequences of data have not shown very 
good results. In particular, learning features and patterns on a very long time 
scale appears problematic for the basic RNN architecture [11]. 

Recently a new type of RNN approach, the echo state RNN (ESN) approach, 
has been published [3]. In contrast to previous approaches, the recurrent con- 
nections are are not trained. Rather, a constant, random connectivity recurrent 
weight matrix is used. The learning is performed only in the connections between 
the hidden layer and the output layer. The recurrent layer acts as a large ’reser- 
voir of echos’ of previous inputs, which may be tapped for information by the 
adaptive layer [1] . In a standard RNN there are many parameters to be optimized 
in the recurrent layer. Well-formed techniques for deciding appropriate values for 
these parameters are still in the development stage. The ESN approach makes 
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it possible to exploit the advantages of RNNs whilst sidestepping the difficulties 
of training that arise in training the recurrent weight connections. 

In order to qualify as an ESN, the network must satisfy the conditions out- 
lined in [1]. The echo-state condition means that although recurrent connections 
exist these connections are weak enough that for sufficient long input histories 
the system converges to a single attractor and this attractor does not depend on 
the initial state of the network. Instead, the system depends on the input history. 
We note that liquid state machines [4] represent an approach that is formally 
equivalent; both concepts were introduced independently at roughly the same 
time. 

Some recent research has shown that this approach may be successfully ap- 
plied to many tasks. However, some problems remain to be solved. One problem 
is noise reduction. In the classic ESN approach learning is performed only in the 
output layer. The internal dynamics are highly complex and recursive function 
of the input. However, because the recurrent weight matrix is fixed and random, 
the internal dynamics are determined by a random function of the input his- 
tory. The burden of demapping this function to perform the prediction task falls 
entirely on the linear output layer. This appears to place some limitations in 
the efficacy of the resulting mapping, for instance with regard to its sensitivity 
to additive noise. Further, a dilemma instrinsic to the ESN architecture exists 
in the relative weighting that should be given to the internal dynamics and the 
input signal. Jaeger [2] notes that this weighting is parameterized by the max- 
imum eigenvalue of transformation matrix: as this variable approaches unity, 
the internal dynamics of the network become increasingly self-sustaining in the 
absence of an input signal. Determining the appropriate maximum eigenvalue is 
somewhat problematic; smaller values increase sensitivity and responsiveness of 
the network over short itme intervals but at the expense of memory at longer 
time scales. Increasing this factor has the opposite effect. 

We note that both the issues mentioned above are related to the fact that the 
network hidden activations are coupled statically to the input / input history. 
In order to address these problems, one approach is to allow some degree of 
adaption in the recurrent connections. In doing so, a bridge is made between the 
’pure’ ESN and other recurrent network learning methods (see [6,8]). Our aim is 
to decrease the sensitivity of the ESN architecture to noise in the input signal, 
by creating and utilizing an internal model of the network dynamics. 

We introduce a method of adapting the recurrent layer dynamics without 
attempting to train the weights directly. Initially a network is generated that 
fulfils the echo state - liquid state condition. A second network is then trained 
to predict the next internal state of the system. Prediction occurs in many bi- 
ological nervous systems e.g. in the cortex [7]. It appears in various contexts, 
as to overcome signal delays, as well as to reinforce reward. In the present con- 
text, prediction is used to generate an alternative version of the original ESN, 
in which the hidden unit activations represents a model of the networks own 
behaviour, that creates a source of dynamics that is not a direct function of the 
input signal, but rather the network’s model of the combined signal / echo-state 
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system. The hidden activations of the modified ESN are then a composite of the 
recursive echo history function (random, untrained), and a trained generative 
model of this function. 

After an initial introduction to ESNs, we demonstrate how to perform self- 
prediction in echo-state networks and how self-prediction is implemented in the 
recurrent connectivity. Some initial comparisons with the original ESN are pre- 
sented using synthetic signals 



adapted wights Recurrent Layer 




Fig. 1. ESN networks: Principle setup 



2 Standard ESNs 

The definitions are provided by following [1,3]. The first part consists of a sum- 
mary of facts proven in [1] that are important for the following considerations. 
Consider a time-discrete recursive function = F{xt,Ut) where are the 

internal states and Uj is some external input sequence, i.e. the stimulus. 

The definition of the echo-state condition is the following: Assume an infinite 
stimulus sequence: u“ = Uq,Ui, . . . and two two random initial internal states 
of the system xq and yo. To both initial states xq and yo the sequences x°° = 
Xq,xi, . . . and y°° = yo,yi, • ■ • can be assigned. 

xt+r = F(xt,U() (1) 

yt+i = F{yt,ut) (2) 

The system then fulfills the echo-state condition if, for any time-step t there 
is a (5(e) for all e > 0. for which 



d{xt,yt) < e 



(3) 
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for all t > S. The ESN is designed to fulfil the echo-state condition. The ESN 
consists of three layers (refer to fig. 1). The first layer is the input. Here the 
stimulus is presented to the network. The subsequent hidden layer is a recurrently 
connected. Final layer is the output layer, which is trained to reproduce the 
target output. The network dynamics is defined for discrete timesteps t. 

Equation of the dynamics of the hidden layer are 



X|in.t+J = Wxt -f w“Ut (4) 

X(+J = tanh (xiin^t+j) (5) 

y* = w°“‘xt (6) 

where the vectors Uj , Xj , y j , are the input and the neurons of the hidden layer and 
output layer respectively, and w“,W,w°“* are the matrices of the respective 
synaptic weight factors. Note that for the output neurons linear connectivity is 
used.^ 

As mentioned above learning is restricted to the connectivity between the 
hidden and the output layer w°“*. All other connections are chosen random. In 
order to fulfil the echo-state condition the connectivity matrix W of the weights 
of the hidden layer should meet the following requirements: 

1. Necessary is that the real part biggest eigenvalue of W is smaller than 1. 

2. Sufficient is that the biggest singular value of W is smaller than one. 

For all matrices w°“* that fulfil requirement 1, but do not fulfil requirement 
2 no general rule is known whether or not the network meets the echo-state 
condition. All real valued matrices can easily transformed to matrices that fulfil 
the first or both requirements by multiplying and appropriate scalar prefactor 
to the matrix. 

The convergence of an ESN goes as: 



d{xt,yt)<aX\ (7) 

where A < 1 and a are constants that are determined by the properties of 
the connectivity matrix W. Thus, the convergence of the echo-state network is 
exponential. That means also the forgetting of the initial state xq also becomes 
exponential. 

The learning in the output layer w°“* is done by solving the (for sufficiently 
long time sequences where the solution is over defined) system of linear equations: 

w°“*xt = (8) 

for all times t of the teaching period, where there is one equation for each time 
step t. With a sufficient amount of timesteps the can be calculated by 
solving these equations. 



^ The nonlinear output layer definition yt = tanh(w°''*xp is the common alternative 
to Eqn. (6). 
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In the offline version of the ESN, Eqn. (8) can be solved by inverting the 
matrix of which row vectors consist of x*. This can be done using singular value 
decomposition (SVD). In addition, online learning can be done with a recursive 
least squares (RLS) learning algorithm [3]. 

3 Using Self-Prediction 

In the following, not only the hidden-to-output connections must be 

learned but also the linear response of the neurons xun t+j Eqn. 4) in the next 
time step. Thus, the teaching signal is 

(9) 

As in Eqn. (8), the output weights can be calculated by solving a system of 
linear equations 



w°“‘xt = 



w. 



2 



(10) 

( 11 ) 



for all times t of the teaching period (except the initial transient period) . 

In one first investigation the quality of the resulting prediction giving a peri- 
odic input signal was tested. Online learning as outlined in [3] was compared with 
the offline learning In the case of offline learning the error of the self-prediction 
is by several orders of magnitude better than the error from RLS. 

The self-prediction was implemented into the modified ESN in the following 
way. The update rule of the recurrent layer is now (cf. Eqns. (4-5)) 

xiin,t+j = ((1 - a)W -I- aw™*) xt -I- (1 - a)w*"ut (12) 

X(+j = tanh (xiin,t+r) (13) 



where a is constant parameter. If the quality of the prediction is sufficient, 
w™*xt and Wxj -|- w*"U( should be near to each other. In these situations 
the dynamics of the network remains almost unaffected by the value of a. In 
situations where prediction is poor, a is expected to determine the degree to 
which the dynamics are modulated by the internal model of the system, versus 
the actual observed data. 

If a is equal to 1, the network becomes independent from the input. Of course, 
the present context, this makes only sense in the case of periodic or at least quasi- 
periodic output signal. In this case the network becomes an autonomous pattern 
generator. A value of a being 0 results in the modified network reverting to the 
original ESN. Other values of a result in some mixture of the two. The dynamics 
of the system are identical in both the initial network and the network after 
implementing the prediction. From a formal point of view merely the weights of 
the network are changed. Thus one can write Eqn. (12) as: 



X|in,t+J = WpewXt + W“„,U( 



(14) 
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where Wnew = (1 — o;)W -|- and = (1 — a)w“. In this way the 

equation is formal equvalent to the equation for the ESN (cf. Eqn. (4)), except 
that W is replaced by Wpew and w“ by respectively. For sufficiently high 
values of a the echo-state condition is no longer fulfilled, in particular if a is 
equal to 1. 

4 Simulation Details 

The aim of the analysis was to compare the performance of the self-predicting 
network with that of the original ESN. By adjusting the a parameter, we are able 
to test the effect of adding self-prediction to the normal echo state dynamics. An 
a parameter of zero corresponds to ’pure’ echo state dynamics, an a parameter 
of 1 corresponds to dynamics governed completely by the self-prediction model. 
Of interest is determining the value of a that corresponds to best network per- 
formance in the presence of noisy input. 

We used one input neuron, 80 neurons in the hidden layer and one output 
neuron. For all simulations presented here the connectivity matrices w“, W, 
w°“* were initialized with equally distributed random values from -0.5 to 0.5 
and fully connected. The matrix W was a random orthonormal matrix, that was 
scaled to 0.8. Both RLS and offline verison of the network were tested. However, 
due to the poor prediction quality of the RLS prediction only the offline version 
produced usable results. 

A training period of 2000 iterations was used. The target function be pro- 
duced by the network was szn^(0.24f), given an input function sin{0.24t). White 
noise was added to the input signal at 15% of the amplitude of the signal. ESNs 
require a brief initialization period in order to stabilize the recurrent dynamics. 
The first 150 iterations were used to initialize the network and were not included 
in the optimization procedure. At beginning of both the transient and the test 
period the state vector xg was initialized with weak noise. 



5 Results and Discussion 

Figure 2 displays the output of the trained network with respect to to the target 
signal for various levels of a. From figure 2, it can be seen that the unmodified 
ESN is most sensitive to noise in the input signal. As expected, the case of a = 1.0 
results in output that well reflects the the structure of the input signal, since 
the network dynamics have been optimized to reflect the dynamics of the target 
signal. However, since this network receives no input it can no longer synchronize 
with the target signal. Best performance was observed for quite high values of 
a, corresponding to a network with dynamics that are a mixture of both echo 
states (that are a function of the input history) and self prediction states (that 
are a function of network’s model of the input history). 

Figure 1 displays the mean square error relative to signal amplitude for vary- 
ing parameters of a. As a increases the performance of the network up to ap- 
proximately a value of 0.97, at which point the performance decreases sharply. 
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Fig. 2. EfTect of varying a for the modified ESN operating on input data with 0.015% 
relative error. Solid line shows network output, dashed line shows the target output 
signal synchronous to the network input. The trained pattern is sin® (0.24t), the input 
signal is sin(0.24i). Top left a = 0 (standard ESN), top right a = 0.8, bottom left 
a = 0.97, bottom right a = 1.0. 



Thus, network with dynamics primarily governed by the network model, with 
minimal correction provided by the input signal. 

The present study represents an initial investigation of the efficacy of improv- 
ing the performance of ESNs by adding a self prediction module to normal echo 
state dynamics. Preliminary results indicate that the inclusion of self prediction 
may improve the performance of an ESN performing a function mapping task 
in the presence of additive noise. We emphasize that the present results should 
be taken as conclusive. However, the inclusion of self prediction modules may 
represent an interesting direction for extending the basic ESN architecture. 

In principle, self prediction in recurrent neural networks is proven paradigm 
for learning and modeling signals. In the present approach, a self-prediction mod- 
ule, which is optimized using standard feed-forward methods, is used as a proxy 
for the recurrent weights, avoiding difficulties associated with other methods of 
training hidden weights in a recurrent network. As a next step, the architecture 
must be tested with non-stationary signals. A related problem is that the present 
prediction is related to a single pattern. Preliminary investigations indicate that 
the present architecture, which utilizes linear prediction, may be inherently un- 
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Fig. 3. Network error to a noisy input signal: The network error depends on the 
value of a for high values of a the network error increases again due to the lacking 
synchronization of the output signal to the input signal. 



stable when attempting to store several multiple patterns. Modifications of the 
current architecture have shown promising results in simultaneously learning 
multiple functions, with changes to the learning method, and the connectivity 
matrix leading to improved results. 

Finally it is to mention that Jaeger [1] suggested a different setup for pattern 
generator using ESN networks. The capabilities of Jaeger’s pattern generator are 
equivalent to the functionality of our system in the case of a being 1.0. However, 
something like a tuning of the a-parameter is not possible there. 
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Abstract. A gene network depicts the inter-regulatory relations among 
genes. Knowledge of the gene network is key to an understanding of bi- 
ological processes. A Bayesian network, consisting of nodes and directed 
arcs, is a convenient vehicle to model gene networks. We described a non- 
linear model for the rate of gene transcription. Levels of gene expression 
are continuous in the model. We employed a genetic algorithm to evolve 
the structure of a Bayesian network. Given a candidate structure, the 
best parameters are estimated by the downhill simplex algorithm. The 
methodology features a reconstruction resolution that is limited by data 
noise. We tested the implementation by artificial gene networks in simu- 
lations. We then applied the methodology to reconstruct the regulation 
network of 27 yeast cell cycle genes from a real microarray dataset. The 
result obtained is promising: 17 out of the 22 reconstructed regulations 
are consistent with experimental findings. 



1 Introduction 

With the advent of DNA microarray technology, researchers can now measure the 
expression levels of all genes of an organism in a single assay [1] [2] . Measurements 
have since been carried out to observe the state of cells undergoing developmen- 
tal program or subjected to experimental/environmental stimuli. Analysis of 
microarray data by clustering methods has become routine. In the analysis, pat- 
terns of gene expressions across time points or different treatments are grouped 
into clusters. The function of an unknown gene can then be suggested from that 
of the known gene in the same cluster [3]. 

Although cells of the same species carry the same genetic blueprint in the 
DNA, not all genes express at any given time and those that express are likely 
to express to different levels. Gene products, mainly proteins, run or catalyze 
the biochemical processes in living organisms. When and to what extent a gene 
is regulated by other genes are key to an understanding of life. A step beyond 
clustering is therefore to reconstruct gene regulation networks [4] [5] . We refer 
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readers to Refs. [6] and [7] for a comparison and overview of the various methods 
of gene network modeling and reconstruction. 

Events of gene expression are stochastic [8]. Microarray technology in its in- 
fancy incurs substantial noise. Because of the cost, replications in typical mi- 
croarray experiments remain low. Methodologies to reconstruct gene regulation 
networks should therefore bear rich statistics and probability semantics. A graph- 
based method called Bayesian network (BN) [9] [10], well developed in the field 
of Uncertainty and Artificial Intelligence, suits most of the requirements[ll]. A 
BN consists of nodes and arcs. In the context of this work, a node represents a 
gene and the value of the node, a random variable, gives the expression level of 
the gene. The existence of an arc between two nodes establishes a probabilistic 
dependence between them. Given a model of gene regulation, the conditional 
probability density for an arc can be written down. With a structure, namely 
the arcs and the linked nodes, the joint probability density of the measured 
gene expression data can be obtained as a product of the conditional probabil- 
ity densities. The task of gene network reconstruction is to find the structure, 
together with the appropriate parameters in it, whose joint probability density 
is maximal. 

In contrast to previous works [12] [13], we treat gene expression levels as con- 
tinuous variables in the BN so that no information is lost in discretization. We 
describe a nonlinear model where the transcription rate of a regulated gene is a 
power law function of the expression levels of the regulatory genes. This power 
law model, having roots in biochemistry, is general and robust [14] [15]. We ex- 
tend the model to allow delayed transcription. We then develop the BN for 
time series gene expression data[16]. Instead of the EM algorithm in most BN 
implementations[17], we implement algorithms that are known to be potentially 
capable of finding global maxima in optimization programs [18] [19]. An impor- 
tant feature of our methodology is that the detail of the reconstructed network 
is limited to the noise level of the experimental data. When the data is noisy, the 
output is conservative with only few arcs. More arcs emerge as the noise level 
decreases. Another feature of the methodology is that it is equipped with a signif- 
icance measure that quantifies degrees of confidence when results are presented. 
We applied the method to the microarray data of the yeast Saccharomyces cere- 
visiae during cell cycle[20]. The reconstructed regulation network, involving 27 
genes, contains 22 arcs, 17 of which are consistent with experimental findings. 

2 System and Methods 

In a microarray assay, treated and wild samples are usually labeled with red 
and green fluorescein respectively. The background subtracted intensity of the 
emitted light reflects the abundance of the mRNA in the sample mixture. The 
ratio of the red to green intensity indicates the degree of up or down regulation 
of the gene upon treatment. We model the rate of gene transcription by the 
following equation[15][21], 

dxi 1 r ^ „ 

-^ = " - PiXi, 

3 



( 1 ) 
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where xi is the measured expression ratio of gene i and is non-negative real. 
ai and Pi are both positive constants, quantifying the rate of transcription and 
degradation, respectively. The product is over the set of genes that have arcs 
pointing to gene i. The exponent, Wji, is real. When Wji is positive (negative), 
gene j induces (represses) the expression of gene i. The multivariate power 
law function was coined S-System and has been extensively used in chemical 
and biochemical kinetics [14] [22]. It was shown that any nonlinear function can 
be locally represented as the power law form if the function is logarithmically 
differentiable [23] . 

Since microarray measurements are conducted at discrete time points, we 
write the model in the discrete from, 

Xi{t -I- 1) = aiY\_Xj{t)'^^' + (1 - Pi)Xi{t) + ei{t), (2) 

3 

where €i{t) is the measurement error for gene i at time t. For simplicity, we 
assume in the rest of the work that the errors, epips, are Gaussian distributed 
with mean zero and variance, cr^, common to all genes at all time points: ei(t) = 
e = iV(0, cr^) for any i and any t. 

Once we have a candidate structure of the gene regulation network, the con- 
ditional probability density, p{xi,t + of the expression of any gene i in 

the network at time t -I- 1 can be readily written down as, apart from the prior 
probability distributions on the ai, Pi, Wji, and a which we assume to be uniform. 



p{xi, t+l; 9i) = p{xi, t+l; a^, Pi, Wji) 

= N(^Xi{t+ 1 ) - aiYl^Xjit)'^^' - (1 - P^)xi{t),a'^^ 

3 

= o(i) 



( 3 ) 



The joint probability density, P(x|5', 6>), of a candidate network of structure, S, 
and parameters, 0, becomes 



T-l n 

P(x|5', 6») = X\p{xi, t; 9i), (4) 

i=l i=l 

where there are n genes in the network and T microarray measurements per- 
formed at time points 0, 1, •••, and T — 1. If the series of measurements is 
repeated R times, the probability density for each replicate is multiplied to get 
the final joint probability density. To avoid bias, integration of P(x|S', 6*) over 
the parameters a^’s. Pi’s, Wji’s, and cr with their prior distributions should be 
carried out. This marginalization can prove to be intractable and we employ 
the approximation that approaches the logarithm of the likelihood as long as 
i? X (T — 1) is large enough[24], 

score(S') = log(P(x|5', 0)) - ^log(i?- (T-l)), 



( 5 ) 
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where O are the parameters that maximize the likelihood function P(x|S', 6*) 
and the second term, with d equal to the number of parameters in the structure, 
is a penalty against structure complexity. The first term of the score metric is 
simply the minimum of the sum of squared errors. The task of gene regulation 
network reconstruction is cast into the search for the structure, together with 
the embedded parameters, that scores highest. The importance of the form of 
the score function is that we seek parsimonious structures that fit the data. 
Other score functions of the same virtue are available [25-27]. They differ in the 
weight on the penalty relative to the sum of squared errors. In the following, we 
describe a powerful search algorithm that, when properly designed, is known to 
reach global extrema. A significance metric of the reconstructed networks will 
also be introduced. 

3 Algorithm and Implementation 

There are two disparate types of unknowns in our Bayesian network. One is the 
possible combinations of arcs between genes across time points. The other is the 
real- valued a’s, f3’s, and w’s in the arcs. The expression level of a gene at present 
time may be affected by the expression levels of any genes at the previous time 
point. There can thus be n x n arcs across time points t and t -I- 1 and the 
number of possible network structures amounts to 2”^”, which reaches 10^° for 
the number of genes in the network equal to 10. As n increases, the number of 
possible structures explodes. An exhaustive search in the structure space can be 
impractical. We employed the genetic algorithm (GA) to explore the structure 
space[18][19][28][29j. In the GA, first of all, a population of candidate networks 
is set up. Their scores are calculated and the networks ranked. Networks with 
high scores are selected as parents. Gross-over operations are then applied to 
some pairs of the parents. Mutation operations are performed on the rest of 
the parents. A new generation of population is thus formed. The procedure is 
iterated till the average network score of the population reaches a plateau. 

When evaluating the score of a candidate network during ranking in the 
GA, we find the parameter values that minimize the sum of squared errors 
by the use of the downhill simplex algorithm for multidimensional function 
minimization [30]. A simplex in d-dimensions is a polyhedron of d-|-l vertexes. In 
the algorithm, a vertex stores a set of possible parameter values, and the polyhe- 
dron, mimicking moves of an amoeba, crawls around the parameter space, creep- 
ing down valleys and shrinking its size down to the bottom of narrow valleys. 
We found the downhill simplex algorithm effective in minimizing the power-law 
function in Eq. (5). 

We encode a network structure in an array which is called ‘chromosome’ in 
GA. The size of the array is variable depending on the number of arcs present in 
the network. Given a structure (genotype), the set of the associated parameters is 
mapped according to the model of Eqs. (1-4). Learning of the optimal parameter 
values (phenotype) is relegated to the downhill simplex algorithm. After learning, 
the optimal values are stored in the arcs (and nodes) which continue to evolve 
in the GA. A network starts with only one arc in the first generation of the GA. 
Arcs can be either created or deleted. The exponent w is initialized to zero in any 
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(b) 

Fig. 1. The structure of the gene regulation network in the simulation, (a) shows the 
nodes and arcs, (b) is (a) with time information. 



newly created arc. Since the evolutionary operations are on the structure and 
the optimization is on the parameter values which are inscribed in the structure 
after learning, our GA is Lamarckian. It was shown that Lamarckian GA is more 
efficient in problems with high dimensionality than pure GA alone [31]. An arc 
survives only if its benefit (the first term in the score metric) outweighs the 
penalty (second term in the score). The feature of the implementation is that 
networks grow in complexity up to a point that is determined by the noise level 
of the microarray experiment. 

Another important feature of our adoption of GA is that a significance metric 
can be readily defined and applied in the end of the GA search. The last gen- 
eration of the GA contains a population of candidate networks that score very 
high. The candidate networks whose scores differ by only a fraction should be 
considered as competitive due to the low replicate and high noise of microarray 
data. It is reckless to report the highest scoring network as the solution. Instead, 
we calculate the frequencies of arcs that appear in the candidate networks in the 
last generation of the GA. This is similar to the feature distribution of Ref. [32]. 
If the arc pointing from gene i to gene k is present in 90% of the networks while 
the arc from gene j to gene k emerges in only 10% of the networks, we conclude 
that arc i ^ k is more significant than arc j ^ k. 

4 Results 

4.1 Simulation 

We tested our methodology by simulation. Figure 1 shows the structure of our 
assumed gene regulation network consisting of 6 genes and 5 arcs. The parame- 
ters in the structure are furthermore taken to be. 
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generation 



Fig. 2. Top (bottom): increase in the average score (number of arcs) of the candidate 
networks as networks evolve in the GA. 



x\ — x\ = —0.2xi 

x'2 — X2 = l.2xf° — 0.6x2 

3:3 — X3 = — 0 . 7 x 3 

X4 — X4 = 1.8x3 ° ~ 1 - 5 x 4 
X5 — X5 = 2.0xi^^ °Xg — 1.0x5 
Xg — xe = — O.lxg 



(6) 



where x( and Xi are the expression ratios of gene i at the next and present time 
point, respectively. We assumed the Xi’s are measured at 7 equally spaced time 
points and that Xi(0) = 1 for all i. The goal now is to reconstruct from scratch 
the structure (5 arcs) and the 15 associated parameters (5 w’s, 4 a’s, and 6 /3’s) 
from the 36 (= 6 genes x 7-1 time points) data points, assuming the dynamic 
model of Eq. (2). Note that for a fully connected network, there need be 36 w’s 
+ 6 a’s -I- 6 /3’s = 48 parameters. With only 36 data points, the solution is 
under-determined. Our approach is to start evolving candidate networks with 
only one arc; these initial networks try to fit the 36 data points with only 1 w 
+ \ a + 6 P’s = 8 parameters. As networks evolve, more arcs, and thus more 
parameters, emerge to fit the data until the score/arcs stop increasing.^ 

Note that the expression patterns of gene 1 and gene 6 are similar in that 
they are not regulated by others and their decay constants differ by a factor of 2. 
If we replace the arc from gene 1 to gene 2 with an arc from gene 6 to gene 2 and 

^ In the case of zero noise in the simulation, an effective noise level can be estimated 
from the data by an iterative procedure [33]. 
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Fig. 3. Histograms of the number of arcs from genes to gene f at the upper left, to 
gene 2 at the upper right, to gene 3 at the middle left, to gene 4 at the middle right, to 
gene 5 at the bottom left, and to gene 6 at the bottom right. The number of networks 
in the population is 200. 



change the value of Wi 2 in Eq. (6) from 2.0 to 4.0, we obtain an identical dataset. 
The program can not differentiate gene 1 from gene 6 and should thus return 
with an equal chance different solutions (structures and parameters) that yield 
the same highest score. We ran our implementation with a population size of 
200 and quit while the average network score of the population stops climbing. 
Figure 2, first of all, shows the increase in the average score as the program 
proceeds. From the trend we learned to terminate the program at generation 
25. Figure 2 also shows the average number of arcs of the candidate networks 
as a function of the generation number. Figure 3 shows the histograms of the 
arcs in the candidate networks at generation 25. The top 5 tall columns give 
top 5 significant arcs which are indeed the right arcs from which the data were 
generated. Note that since gene 1 and gene 6 are indistinguishable, column 1 
and column 6 in the upper right and lower left histograms of Fig. 3 are indeed 
roughly of the same height (significance). 

4.2 Application to Yeast Cell-Cycle Microarray Data 

Before applying to real biological data, we extend the model of regulation, Eq. 
(2), to accommodate delayed transcription: 

x,{t+l) = - 2)“’“ + (1 - Pi)xi(t) + e, 

j k I 



( 7 ) 
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where expression of gene k at time point t—1 affects that of gene i at time point 
t + 1, and so on. The number of different time lags depends on the data. For 
example, in a time series experiment with only 7 time points, we may allow a 
maximum of 3 time lags. The structural search space trebles in such case. 

We applied the methodology to a standard time-series microarray dataset on 
the yeast Saccharomyces cerevisiae during cell cycle [20]. At different stages of 
a cell cycle, different sets of genes are activated with products performing vital 
cell-cycle dependent functions such as budding, DNA replication/repair, mito- 
sis control, cytokinesis, and mating. Coordinate expression of the stage-specific 
genes during cell cycle is thus an important feature of cell cycle regulation [34]. To 
model the regulation, Xi on the left-hand side of Eq. (7) represent the expression 
ratios of those cell-cycle dependent genes (target genes) while Xj,Xk,xi on the 
right are those of the transcription factors. Establishment of an arc represents a 
regulation of a target gene by a transcription factor. 

Among the four datasets available in the database [20], we chose to analyze the 
‘a factor’ dataset where genome-wide (7,220 ORF’s) expressions were measured 
at 18 time points at an even time interval of 7 minutes. Since the time series 
spans roughly two cell cycles[20], we allow 5 time lags on the right of Eq. (7): 
l,t-2,t-3,t-4. 

To get the noise level of the measurement, i.e. cr of the Gaussian noise e 
in Eq. (7), which limits the resolution of our reconstruction, we histogram the 
genome-wide expression ratios taken at the first time point when the cells are 
just about to grow. If there were no noise, the red to green ratios would read 
one. Figure 4 shows the histogram. The standard deviation of a Gaussian fit to 
it gives the estimated noise level of the experiment: cr = 0.19. 

At least 11 yeast transcription factors (b'lFL/, SWIG, STBl, MBPl, SKN7, 
NDDl, FKHl, FKH2, MCMl, SWI5, ACE2) and one cyclin gene {CLN3) are 
known to activate cell-cycle dependent genes [35] [36]. Although about 800 genes 
showed periodic expression during a cell cycle [20], only few activators— ^-target 
pairs are known. In this pilot study, we select 15 target genes {CLNl, CLN2, 
CLB5, CLB6, GIN4, SWEl, CLB4, CLBl, CLB2, TEMl, APCl, SP012, 
CDC20, SICl, FARl) whose transcription activators are known. We stop the 
program after the average score of the 3,600 networks in the GA population 
ceases to grow. Figure 5 shows the reconstructed regulation network. The pro- 
gram, written in Java, finishes in one hour on a 2.0 GHz Pentium IV PG running 
Redhat 7.3 Linux. 

To help assess the performance of our methodology, we summarize what are 
known about yeast cell cycle regulation [37-45]. MBF (a complex of MBPl and 
SWIG) and SBF (a complex of SWI4 and SWIG) control late G1 genes. MCMl, 
together with FKHl or FKH2, recruits the NDDl protein in late G2 and con- 
trols the transcription of G2/M genes. SWI5 and ACE2 regulate genes at M/Gl. 
Only 5 arcs {NDD1^CLN2, NDDl^CLBG, STB1^SP012, CLN3^SP012, 
and CLN3^GIN4) in the 22 reconstructed arcs in Fig. 5 are not accounted 
for by experiments to date. 
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Fig. 4. Distributions of the yeast cell cycle gene expression ratios. The solid line his- 
togram is the 7,220 ratios in the dataset at the first time point. The smooth curve 
shows a Gaussian fit to it. The mean of the Gaussian is 0.93 and the standard devi- 
ation is 0.19. The dashed line histograms the 7,220 ratios at the second time point. 
Gomparing the two histograms, we see that some genes started their expression by the 
second time point. The data is freely downloadable from the website in Ref. [20]. 



5 Discussion 

We indicated that a positive (negative) Wji represents an activation (repression) 
of gene j on gene i in the power-law model. In Fig. 5 we observe some repressions 
though the transcription factors are known to activate the target genes in the cell 
cycle. Transcription of a gene is more likely effected by joint binding of multiple 
transcription factors on the promoter region of the gene in the DNA[46][47]. 
Interpretation of the sign of Wji thus becomes less clear and our reconstruction 
of, say, the arc FKH2-^ CLB2 can indicate a dominating role of FKH2 on CLB2 
over MCMl, FKHl, and NDDl on CLB2. 

As the fitting error drops below the experimental noise (cr), the first term 
of the score function, Eq. (5), approaches an asymptotic value. Further growth 
of the network will become disfavored because of the penalty term of the score. 
Networks thus effectively cease growing in the GA. The noise level of the ex- 
periment limits the detail of the network that our methodology can reconstruct. 
The sparse network of Fig. 5 reflects this feature. 

mRNA is the mediating template between the gene and protein. The gene 
regulation network reconstructed from microarray data, which record mRNA 
abundances, thus serves only as a clue to the underlying true gene regulation 
network in that pieces of information (mRNA translations and protein-DNA in- 
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Fig. 5. The reconstructed model of regulation of the yeast cell cycle dependent genes 
by the transcription factors. Solid, dashed, dotted, dotted dashed, and dotted dotted 
dashed line represents influence of regulation from time point t,t — 1, t — 2, t — 3, and 
t — 4, respectively. The sign of the w’s are positive (negative) for solid (empty) arrow 
heads. See text for a discussion of the up or down regulation. 



teractions) are lacking. Post-translational modification and alternative splicing 
further complicate things. When considering the high noise and low replica- 
tion characteristics of microarray data, we believe that our methodology is very 
promising in elucidating gene regulation networks. 

Ways of relaxing the assumption of a constant noise variance, as in the model 
of Eq. (7), toward a more realistic noise distribution function would depend on 
the microarray technology, experimental protocol, and data normalization [48]. 
As the techniques advance and converge, our methodology can take into account 
the details through, for example, a prior on the noise variance. 

As the search space of the problem grows exponentially with the number of 
genes, a limitation to our methodology lies in its intense demand for CPU. Hard- 
coding of prior arcs into the candidate networks can reduce the search space. To 
reconstruct regulation networks consisting of a large number of genes, adoption 
of known regulation pathways/networks from databases in our methodology via 
priors not only mitigates computational burden but also accords with the spirit 
of Bayesian approach. 

It was found out recently that a wide range of networks including the net- 
work of metabolism [49], the network of neural cells [50], power line grids [51], the 
World Wide Web[52], networks of scientific citations [53], etc, show the scale-free 
property that the number of nodes with k connections, N{k), follows a power 
law [54], 

N{k) ^ (8) 

where 7 is between 1 and 3. It is conceivable that the connectivities of gene 
regulation networks may also follow a power law in that organisms evolve from a 
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minimal form and new genes that have links to popular genes are likely to emerge 
to conserve resources. Scale-free networks also have the property of resilience to 
random attacks (mutations) on nodes (genes) [55], which is a favorable feature 
for species to survive in harsh environments. In our methodology, we can easily 
honor the scale-free property by including a penalty term that is proportional 
to deviation of the candidate structure from the proposed form of Eq. (8). The 
other approach is to modify the cross-over and mutation operations of the GA 
so that the resulting structures follow the power law. This latter approach is 
advantageous if the penalty term (and especially its weight relative to the other 
terms in the score metric) for a certain variant of the power-law is not readily 
expressed (determined) [56]. 

With a minor modification to the dynamic model of Eq. (2), gene deletion 
expression data can be analyzed by our methodology. In this scenario[57] [58], the 
product over the time index in Eq. (4) is replaced by a product over a mutation 
index for different expression datasets from different mutant strains. The other 
modification is on the GA operations to forbid structures with chains of arcs 
forming closed loops. 

In summary, we applied the method of Bayesian networks to reconstruct 
gene regulation networks from time series microarray data. We described a non- 
linear function to model the change of gene expression levels over time. Levels 
of gene expression are continuous in our analysis. We employed a scoring met- 
ric related to the probability density of the measured expression data. Network 
reconstruction became an optimization problem where relations among genes 
are sought to maximize the score. We implemented the downhill simplex algo- 
rithm for parameter estimation and genetic algorithm for structure search. We 
devised operations that do not evolve structures one way in the genetic algo- 
rithm; arcs can be either added or removed to prevent data over-fitting. Our 
methodology was tested in simulation to avoid local optimal solutions and to 
find degenerate global optimal solutions. The ensemble of the networks in the 
last generation of the genetic algorithm helped provide an implementation for 
a confidence metric. Application of our methodology to reconstruct yeast cell 
cycle gene regulation from DNA microarray data showed encouraging results. 
Our network reconstruction methodology is amenable to such imposition as the 
speculated scale-free network property. 
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Abstract. We review the structure of cerebral cortex to find out the number of 
neurons and synapses and its modular structure. The organization of these 
neurons is then studied and mapped onto the framework of an artificial neural 
network (ANN). The computational requirements to run this ANN model are 
then estimated. The conclusion is that it is possible to simulate the mouse cortex 
today on a cluster computer but not in real-time. 



1 Introduction 

Our vision is that we one day will be able to emulate an artificial neural network 
(ANN) with size and complexity comparable to the cortex of human. The naive 
approach to this grand task is to build a biophysically detailed model of every neuron 
in the human cortex. But doing this is not feasible within the foreseeable future even 
in a very long perspective. We will assume here that the functional principles of 
cortex reside at a much higher level of abstraction than at the level of the single 
neuron i.e. closer to that of an ANN of the type used in connectionist models. 

The target computational structure in our vision is not a super-computer that fills 
an entire room but a small structure no larger than a human brain. Having such an 
artificial nervous system (ANS) would take us closer to the goal of building truly 
intelligent robots and electronic agents. 

A short-term goal is to build a cortex-sized multi-network of ANNs and run it on a 
parallel super computer. Parallel cluster computers are getting more and more 
common today and they offer a large amount of memory and computational power. 
An application such as ANN that has an inherent parallelism is well suited to run on 
this type of computers. 

The problem of building an information processing system equivalent to the 
mammalian cortex can be divided into three sub-problems; defining the size and 
complexity of the ANS, describing the algorithms and implementation of these 
algorithms, and actually constructing the hardware needed to run these algorithms. In 
this paper we focus on the first of these three tasks. 

The steps towards defining the size and complexity of the ANS are the following: 
First we must find out the size and dimension of the corresponding biological 
systems. Secondly we need to define what type of network architecture we aim for 
and how the biological neurons are mapped onto this architecture. Thirdly we want to 
estimate the computational requirements posed by the model. 
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Here we focus on the large-scale properties of a system comparable to the 
mammalian cortex. We do not consider the internal representation or how the data is 
processed (e.g. time series). We are only interested in the complexity in terms of 
memory, computations, and communication required to emulate such a system. 

1.1 Our View of the Mammalian Cortex 

The mammalian cortex is a structure composed of a vast number of repeated units, 
neurons and synapses, with similar functional properties though with several 
differences between brain areas, individuals, and species with regard to e.g. number 
and types of neurons and their anatomical and functional arrangement. The state of 
these units is primarily expressed in their electrical activity but many other internal 
variables, like subthreshold intracellular potential and Ca^"^ ion concentration, change 
as well. 

A long-standing hypothesis is that the computations in cortex are performed based 
on associations between units, i.e. cortex implements some form of associative 
memory. Individual projections (bundles of connections) between populations of 
neurons can either be feed-forward, feed-back or reciprocal. It is also possible to have 
a network that combines these three types of connections. A substantial part of the 
cortical connectivity is recurrent, within areas as well as between them. It is further 
reasonable to assume that this network is symmetrically connected, at least in some 
average sense [1]. This makes it plausible to assume that the cortex to a first 
approximation operates as a fix-point attractor memory [2, 3]. Recently additional 
support for the relevance of attractor states has been obtained from cortical slices [4, 

5]. 

This idea was already captured in the early work of Donald Hebb. A prototypical 
attractor memory like the recurrent Hopfield network can be seen as a mathematical 
instantiation of Hebb’s conceptual theory of cell assemblies. In a general sense, in the 
following we thus regard the cortex as a huge biological attractor memory system. 
This top-down view helps to define the problem of modeling and implementing a 
system of such a high dimensionality and complexity as the mammalian cortex. 



2 The Mammalian Cerebral Cortex 

A primary task is to study the neuronal systems in nature. By doing this we can define 
what the properties are of the system we are trying to model. We start by finding out 
how many neurons and synapses there are in the cortices of mammals. By studying 
how they are organized and modularized and what their functional properties are we 
expect to find out the dimensions and complexity for a cortex sized ANN. 
Furthermore we expect to be able to define the timescales of a cortex-simulation 
running in real-time. While collecting data for a highly reduced model and we accept 
that the data is sometimes contradictive and incomplete. 
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2.1 General Characteristics of Cortex 

The cortex is attributed the higher functional capabilities of the brain and sits on top 
of the basal, older, parts of the brain. The neocortex of mammals is generally quite 
homogenous [6] and it has seen a great increase in size during evolution. Pyramidal 
cells with far stretching axons and locally highly connected intemeurons constitute 
the two major cell types. Approximately 75-80% of the neurons are of the pyramidal 
type and the remaining 10-25% are mainly of the 10 types or so of inhibitory 
intemeurons [7, 8]. In humans the cortex is about 3 mm thick [9, 10] and in mouse 
about 0.8 mm [6]. The layered stracture of the cortex seems to apply to all mammals 
although the layers can be hard to distinguish, especially since the cortex is more 
compact in smaller mammals with smaller brains. An interesting property that applies 
to all mammals is a constant neuron density of about 10^ mm'^ [6, 8] except for VI in 
primates [6, 11]. The average number of synapses per neuron found in different areas 
and in different species varies between 2- 10^ and 2- 10"^ [8]. The average number of 
synapses per neuron in mouse [8], cat [12], and man [13] is about 8' 10^. A question 
that arises is why the human cortex is so much thicker than that of mouse. The answer 
is probably that the intracortical axons in humans are longer and therefore more 
myelinization is used. This is supported by the fact that cortex is thicker in the 
association areas where the connectivity is higher [6]. 

2.2 Timescales of Cortex 

The primary timescale when simulating a network of electrically active units is the 
time it takes for a unit to change its electrical potential. This time sets an upper bound 
for the resolution needed in the simulation. Another defining timescale is that of the 
connection plasticity between units, how rapidly the strength of a connection can 
change. If one thinks about a system implementing an attractor memory it is also 
interesting to know the time it takes for the system to go to a fix-point. 

The initiations of excitatory and inhibitory postsynaptic currents are fast processes 
that occur on timescales of less than 1 ms [14]. These currents change the electrical 
potential of the neuron, which forms the basis of neural computations. 

The potential in the neuron changes on timescales of about 5-10 ms [3]. 

Approximately 50% of the spikes generated by pyramidal cells are grouped i.e. the 
spikes occur in a burst. The mean spike interval in a burst is approximately 10 ms. A 
single spike generates a low probability of synaptic release, but a train of spikes gives 
a probability of synaptic release close to 1. The high synaptic release probability 
comes at the price of a lower temporal resolution [14]. We take this as an argument 
for replacing a burst of high frequency spikes with a single spike in an ANN. 

Experiments performed on monkeys indicate that the temporal precision of neural 
activity in response to a visual stimulus is on the average 10 ms and sometimes as 
high as 2 ms [15]. This means that cortical neurons are able to reliably follow events 
in the external world with a resolution of approximately 2-10 ms [14]. 

Synaptic plasticity occurs over a broad spectrum of time scales [14, 16]. The 
fastest event is facilitation that occurs on time scales of 10-100 ms. The slowest 
events, long-term potentiation and depression, occurs one timescales of hours, days, 
or longer. 
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To summarize the discussion; around 10 ms is the smallest timescale at which 
information is exchanged and thus this is an appropriate timescale when one simulates 
neurons with an ANN [3]. The probable timescale for attractor dynamics, i.e. 
convergence to a fix-point, is ranging from 30-200 ms [3]. Hebbian modifications 
(synaptic plasticity) occur on timescales of a few hundred milliseconds and longer. 

2.3 The Minicolumn 

The concept of the cortical minicolumn as both a functional and anatomical unit is 
today well established [7, 17, 18]. The idea that there exist small regions in cortex 
with homogenous response properties was first proposed by Lorente de No in 1938 
[7]. The homogenous response properties of small columnar volumes in cat’s somatic 
sensory cortex were first described by Mountcastle [19] in 1956. Later a more 
detailed description of these columns and their functionality was given by Hubei and 
Wiesel [20, 21]. The work of both Mountcastle and Hubei and Wiesel proposed the 
minicolumn as the principal functional unit in cortex. The atomic view of the 
minicolumn was further refined in the work of Buxhoeveden et al. [22]. Today the 
minicolumn has been found in the cortex of most mammals [7] both as an anatomical 
and functional structure. 

An argument for a columnar organization of the neurons comes from how the 
axons traverse and connects in the superficial layers. The collateral axon travels a 
lateral distance of up to 3.5 mm without giving off any terminal branches; then it 
produces a tight terminal cluster of 300-600 [tm [23]. Another argument favoring 
columnar organization over single neurons holds that the cortex lacks enough 
myelinated bundles to connect every neuron reciprocally in each cortical hemisphere. 
Estimates of the number of callosal fibers in humans range from 2-5 TO* [7]. A 
possible solution to the connectivity dilemma proposed by Buxhoeveden and 
Casanova is that minicolumns and not cells establish contact with each other [7]. 

A minicolumn is a vertical column with about 100 neurons that stretches through 
the layers of the cortex. Each minicolumn has afferent input, efferent output, and 
intra-cortical circuitry [22]. Within each minicolumn there are both excitatory 
pyramidal neurons and inhibitory interneurons [22]. The horizontal diameter of a 
minicolumn varies slightly between different cortical areas and mammalian species. 
The typical minicolumn has a total diameter of about 50 pm and an inner circle with a 
diameter of approximately 30 pm where the neuron density is high [22]. In mammals 
a minicolumn diameter of about 20-60 pm have been found [7, 8, 18, 22, 24-26] and 
in humans the diameter is about 40-50 pm [7]. The number of neurons in a 
minicolunm has been estimated to; 80-100 [7, 18], 100 [24, 27], 110 [25], and 80-260 
[26]. There are studies that indicate that the neuron count in primate minicolumns of 
VI is above 200 [6]. In smaller mammals the minicolumn organization is less visible. 

Although there exists some differences between minicolumns located in different 
parts of the cortex (such as the exact size and structure and active neurotransmitter) it 
seems as though it represents a general building block of the cortex [28, 29]. 

There are large differences in the cortical surface area between different mammals 
[9]. The ratio between the cortex area in mouse, macaque monkey and human is 
approximately 1; 100; 1000 but yet the thickness only varies a factor 4. This means that 
cortex is enlarged through an increase of its area and not its thickness [27], and it does 
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also mean that the number of minicolumns and not their size is increased when cortex 
expands which has been seen in transgenic mice [30]. 



2.4 The Hypercolumn 



Another structure seen in the cortex of mammals is the hypercolumn. Hypercolumns 
are sometimes referred to as macrocolumns and also ocular dominance columns when 
found in the visual cortex. A hypercolumn is a structure that, similar to the 
minicolumns, stretches vertically through the cortex. Often it has the form of bands 
300-1000 pm wide [7, 16, 18, 21, 24-26] and up to 3000 pm long [26]. Hubei and 
Wiesel pioneered the study of the hypercolumns of cat’s and macaque monkey’s 
visual cortex [20, 21]. They named the structure “hypercolumn” and we will use this 
here. 

A hypercolumn contains a number of minicolumns. The exact belonging of a 
minicolumn may be somewhat diffuse and sometimes a minicolumn belongs to 
several hypercolumns. The typical number of minicolumns in a hypercolumn has 
been estimated to 60-80 [7] and 80 [18]. 

It seems very possible that the minicolumns within a hypercolumn have a 
functional dependency. Buxhoeveden and Casanova have stated that the output of a 
hypercolumn results from tightly knit interactions between the smaller minicolumns 
within it [7]. Hubei and Wiesel showed by electrophysiological experiments in 
primates that the hypercolumn function as a WTA circuitry in the visual cortex [21]. 



Q hypercolumn 
Q minicolumn 
^ basket neurons 
^ chandelier neurons 
— • inhibitory connection 
- o excitatory connection 



cortical excitatory connection 




Fig. 1. Two hypercolumns and their internal structure. 



In Fig. 1 we show schematically how the minicolumns are grouped into 
hypercolumns. The minicolumns are arranged around a pool of inhibitory basket 
neurons that provides inhibition for all types of neurons [29]. The pyramidal neurons 
in a minicolumn have excitatory connections to the basket neurons and they receive 
inhibitory input from the basket neurons. The reciprocal connection between all 
minicolumns and the basket neurons facilitates the WTA circuitry [31]. Inside each 
minicolumn there is a pool of chandelier neurons. The chandelier neurons have a 
highly localized axonal arborization and provide inhibitory synapses that exclusively 
target the initial segments of the pyramidal axons [29]. 

Intracortical connections exist between minicolumns in different hypercolumns. If 
the axon terminates on pyramidal neurons in the targeted minicolumn, the connection 
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is excitatory. But if the axon terminates on chandelier neurons in the targeted 
minicolumn the connection is inhibitory. 

A connection between two minicolumns can be supported by one or several 
synapses. Most of the synapses in the minicolumns are devoted to receiving input 
from distant minicolumns situated in other hypercolumns i.e. the number of 
connections in cortex is dominated by the corticocortical connections [2]. The number 
of local connection within a minicolumn is an order of magnitude smaller than the 
corticocortical connections. Also the intra-hypercolumnar connections between 
minicolumns can be neglected. 

2.5 Numbers of the Cortex 

The numbers relating to cortical micro- and macro-architecture vary a lot. This 
variation is due to uncertainties in measurements but also due to large variations 
between individuals [32]. Neuron counts for the entire neocortex are generally 
extrapolations of pinpoint counts. Therefore the estimated number of neurons depends 
on which area of cortex the measures are taken. Neuron counts are further sensitive to 
shrinkage of the tissue. 

Table 1 contains a compilation of the experimental data in appendix A that we use 
as a basis for our calculations. In this compilation we have averaged the data in 
appendix A. 

Table 1. The cortex data in appendix A summarized for a number of mammals. All types of 
neurons are included in the counts. 





Human 


Macaque 


Cat 


Rat 


Mouse 


Cortex Area (mm^) 


2.4-10® 


2.5-10" 


8.3-10® 


6.0-10® 


2.5-10® 


Neurons 


2.0-10^° 


3.0-10® 


6.0-10® 


5.0-10^ 


2.0-10^ 


Neurons (mm'^) 


8.3-10'' 


1.2-10® 


7.2-10" 


8.4-10" 


8.0-10" 


Synapses 


1.5-10'" 


2.2-10'® 


4.5-10'® 


4.0-10" 


1.6-10" 


Synapses / Neuron 


7500 


7300 


7500 


7900 


8000 



We assume that the average minicolumn is composed of 100 neurons and we know 
the total number of neurons, we can calculate the number of minicolumns (Table 2). 
The area density of neurons is roughly constant, lO^mm'^, and therefore the average 
minicolumn diameter is about 36 pm. This diameter fits the figures found in the 
literature. 

There is a vide variety of shapes and sizes of hypercolumns. Based on the literature 
we assume that the hypercolumn is typically a circular structure with a diameter of 
about 500 pm. In this area it is possible to fit about 160 minicolumns with a diameter 
of 36 pm. 

Table 2. The number of mini- and hypercolumns calculated for a number of mammals. 





Human 


Macaque 


Cat 


Rat 


Mouse 


Minicolumns 


2.0-10® 


3.0-10'' 


6.0-10® 


5.0-10® 


2.0-10* 


Hypercolumns 


1.3-10® 


1.9-10® 


3.8-10" 


3.1-10® 


1.3-10* 
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We assume that each neuron on an average has 8' 10^ synapses, which gives a slight 
overestimation of the total number of synapses. Furthermore we assume that it takes 
five synapses (out of the 8' 10^ potential ones) to establish a connection between two 
minicolumns. The number of synapses devoted to intra-columnar connections is a 
negligible fraction. This means that each minicolumn is capable of 1.6T0"' 
corticocortical connections. In Table 3 we compare the number of possible 
connections to the case of full connectivity between all minicolumns. The results 
show that there are enough of corticocortical connections to achieve a near complete 
connectivity in the mouse cortex while in humans the number of available 
connections are only enough to give a connectivity of one per thousand. 

Table 3. The number of connections and connectivity in cortex. The connections are between 
minicolumns and the level of connectivity is computed as the fraction of full connectivity 
between the minicolumns. 





Human 


Macaque 


Cat 


Rat 


Mouse 


Corticocortical 

Connections 


3.2-10^® 


4.8-10^^ 


9.6-10^^ 


8.0-10^° 


3.2-10^° 


Connectivity 


8.0- 10'" 


5.3- 10'® 


2.7- 10'^ 


3.2- 10''' 


8.0- 10''' 



3 Our Model of Cortex-Sized Attractor ANN 

As stated earlier we propose that the cortex can be modeled with a multi-network of 
ANNs. The basic functionality of this network is imagined to be similar to that of a 
large attractor memory. In order to simplify the model further and its implementation 
we will analyze a single cortex-sized recurrent ANN. 

3.1 Appropriate Mapping 

A common choice is to represent each neuron with a unit of the ANN. This approach 
is appealing because the mapping from the physical neuron to the functional unit in 
the ANN is quite straightforward. A disadvantage with this mapping is that if a single 
neuron is lost the system becomes affected. Another problem is that of mapping the 
synaptic connections of neurons onto those of the ANN. A neuron only has either 
excitatory or inhibitory synapses, not both as the units in an ANN. Furthermore; this 
mapping only supports very low levels of connectivity in a network. 

We argue that the cortical minicolumn is the appropriate structure to map onto the 
unit of an ANN. The benefits of adopting this view are that the redundancy and 
connectivity of the cortical functional units is increased. If a cell in the cortical 
minicolumn dies not much is lost. The local connectivity in a network of minicolumns 
can be high. A network based on minicolumns can explain the existence of excitatory 
and inhibitory connections between its units. 
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3.2 Appropriate Complexity 

The functional complexity of the units in an ANN is arbitrary while we do not know 
exactly what functional properties are actually relevant for information processing. A 
top view in terms of attractor memories and learning makes it possible to infer the 
functional requirements of the units. 

In many models where large sized ANNs are used the functional capabilities of the 
units are reduced to a minimum. In the extreme case the units have a binary activity 
and binary connection weights as in a Palm-Willshaw network [33, 34]. For instance, 
one runs into problems with these binary units as when trying to construct a 
palimpsest memory with one-shot learning. In our view, the appropriate level of 
functionality is higher than in those binary units. We want to be able to construct 
efficient incremental and palimpsest memories [35, 36]. Furthermore we want an 
architecture that enables the use of eligibility traces and relevance signals. This makes 
it possible to use several different learning methods such as reinforcement learning 
and attention controlled learning. 

3.3 General Properties of the Model 

When considering a model of the cortex it is important that it scales well in terms of 
implementation. Many of the more abstract models use layered networks that are 
trained with back-propagation algorithms. This is obviously not done in the brain 
since it requires non-local computations, which is costly in terms of communication. 
ANNs that use non-local computations typically have poor scaling properties. More 
biologically realistic models implement learning rules that only require local 
computations such as the Hebbian rule. The advantage of learning rules that are local 
is that they have a potential to scale well. 

Using hypercolumns in an ANN makes it similar to the Potts neural network [37- 
40]. In the binary Potts networks each unit can have more than two states as opposed 
to a general Hopfield network were the units only has two states. Hypercolumns also 
seems to slightly reduce the number of spurious states in the memory compared to 
other threshold control strategies [40] . 

3.4 Implementation Issues 

ANNs have for a long time been implemented in special purpose hardware. A 
problem in many of these hardware designs has been the communication bottleneck 
between the units. Silicon designs are built in two dimensions instead of three as in 
cortex, which means that there is less space for communication hardware [41]. 

A commonly used method to avoid the communication bottleneck and achieve a 
scalable neural system is to use AER (Address Event Representation) [42-44] in the 
inter-unit communication. AER means that the activity in the ANN is represented as 
discrete spikes and the only information that needs to be transmitted over the 
computer network is the address or an identification of the spiking units. Spiking 
communication is often also paired with heuristic knowledge about the 
communication in the ANN. A commonly used method is to only allow units with a 
large enough change in their input to transmit their activity. With AER it has been 
possible to run simulations with up to 10^ units (without learning) [45]. AER also 
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enables a good load balancing which is important especially if the networks or the 
computational resources are heterogeneous [46]. 

An ANN is composed of units that are arbitrarily connected while on the other 
hand cortex is composed of neurons that have a preference for local interactions. The 
topology on most of today’s cluster computers is very flat. The nodes of a regular 
cluster computer are normally connected with a high speed Ethernet. This means that 
the topology of the nodes is a low order star shaped one. The consequence of this is 
that even if there is a topology in the ANN one is unable to take advantage of it. Thus, 
one may as well assume a random connectivity in the ANN. This assumption is valid 
since the preference for local interactions in cortex presumably are due to 
performance reasons and that a system with global interactions is at least equivalent to 
that with only local interactions. Furthermore it means that all communication goes 
through the same channel and has to be kept at a minimum in order to avoid a 
communication bottleneck. 

The hypercolumns allow a very effective implementation of spiking communica- 
tion. One or a few hypercolumns is placed on each node with local memory relating 
to incoming connections. The WTA circuitry in a hypercolumn means that only one 
spike is generated in each hypercolumn at each time-step. Information about which of 
the units in the hypercolumn that generated the spike is broadcasted to the other 
hypercolumns. This allows for a minimum of communication between the 
computational modules and enables coarse grain parallelism. 

3.5 Precise Specification of the Computational Demands of the Model 

A connection between two minicolumns is supported by several synapses. In the 
calculations we rather arbitrarily use the figure five synapses per connection. 

The memory requirements and the computational load are proportional to the 
number of connections. Two 32-bits variables represent each connection. One 
variable is used for an eligibility-trace and the other one is used for the memory. Both 
the eligibility-trace and the memory are implemented with first order differential 
equations, which are solved with Euler’s method [35]. Five operations are needed to 
update one connection. 

Real-time operation of our model is defined as an update frequency of 100 updates 
per second of all weights and activity values in the ANN. 

4 Performance Demands for a Cortex- Sized Attractor ANN 

The limiting factor when simulating large ANN is the number of connections 
(synapses), since it generally is several orders of magnitude greater than the number 
of neurons. 

If we simulate a large fully connected ANN we need to communicate the activity 
of every hypercolumn to every other hypercolumn. If we run the ANN on a cluster 
composed of a large number of processor nodes we need to transfer the activity 
between these nodes. This communication needs to occur with a frequency of about 
one every 10 ms. 

The bandwidth required for the communication between the computational nodes 
is determined by the ANN structure, the topology of the nodes, the communication 
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protocol, and the update frequency. What particular matters in the ANN structure is 
the number of hypercolumns. Here we let each hypercolumn generate a 32-bit 
message each update that identifies the spiking unit. The computational nodes are 
connected by Ethernet connections and thus the topology is flat, which means that all 
nodes have to share the same bandwidth. On an Ethernet there are two choices of 
communication protocols, TCP and UDP. 

With TCP based communication no message are lost and the ordering of the data 
transfers is deterministic. In a TCP based implementation one of the nodes is declared 
a server while the other nodes are declared clients. The server node controls the flow 
of messages between all nodes. This incurs a large communication overhead. Popular 
communication APIs such as MPI and PVM are based on this type communication 
[47]. 

UDP based communication does not guarantee that messages arrive. There is 
neither any control of when messages arrive or in what order they arrive. The benefit 
of UDP is a very small communication overhead. 

M is the amount of data generated by all hypercolumns during a single update, k is 
the size of a message-header, C is the total number of nodes, and/is the frequency of 
updates. The required bandwidth for TCP, Brcp, is then computed with eq. (1) and for 
UDP, Budp, with eq. (2). 

B^cP = C{M + M !C + 2k)f (1) 

B^^, = C(M/C + k)f (2) 

Table 4. The minimum bandwidth requiered for communication between the hypercolumns. A 
system of comparable size to human cortex is termed H-sized and so on for macaque, cat, rat, 
and mouse. These figures are computed with eq. (2) for M=hypercolumns-32, k=0, and/=100. 

H-sized Ma-sized C-sized R-sized Mo-sized 

Bandwidth (Mbit/s) 3800 570 110 10 4 





Fig. 2. The bandwidth requirement computed for two communication protocols. The left plot 
shows the requirement for TCP based communication and the right plot shows the requirement 
for UDP based communication. Note that the number of nodes is much smaller than some of 
the values in Table 6. Parameters: M=hypercolumns-32, j=U)0, and k=l2. 






Towards Cortex Sized Attractor ANN 



73 



UDP based communication scales proportional to the number of hypercolumns 
while the TCP based communication scales proportional to the number of 
hypercolumns and nodes (Fig. 2). In practice this is not really true. TCP based 
communication use the full available bandwidth, but UDP based communication does 
not. The reason is that in the UDP implementation the nodes have a refractory period 
before they send their next message. But if asynchronous communication is used 
together with UDP there is no need to have a refractory period. 

The memory and number of operations required to run an ANN of a certain size is 
computed in Table 5. Both the required memory and number of required operations 
are a function of the number of connections. The maximum number of connections in 
a fully connected ANN with N units is N^. The memory requirement in bytes is 
calculated as; Memory=2-32/8=8 A^. The number of operations per connection (OPS) 
and connections per second (CPS) are calculated as; OPS=5 A^, CPS=N^. 

The newest high-end computer at KTH is an Itanium II based Linux cluster. Each 
node in this cluster has 6 GB of memory and a peak performance of 7.2 GFLOP. 
Based on these figures we can calculate the number of such nodes needed to run an 
ANN of various size. In Table 6 the first row shows how many nodes that must be 
used to fit the ANN into memory. The second row shows how many nodes that must 
be used if we also want to run the ANN in real-time, i.e. a fix-point convergence in 
about 200ms. 

Table 5. Memory and operations requiered to run an ANN in real-time. CPS is connections per 
second and OPS is operations per second. 





H-sized 


Ma-sized 


C-sized 


R-sized 


Mo-sized 


Memory (10* bytes) 


2.4-10® 


3.6- 10"^ 


7.2-10® 


6.0-10® 


2.4-10® 


CPS (10’) 


3.2-10® 


4.8-10® 


9.6- 10"^ 


8.0-10® 


3.2-10® 


OPS (10’) 


1.6-10^ 


2.4-10® 


4.8-10® 


4.0-10'' 


1.6-10'' 


Table 6. Number of nodes needed to 


run an ANN. This number is 


computed both for the case 


where the memory is the limiting factor and where the computations 


is the limiting factor. 




H-sized 


Ma-sized 


C-sized 


R-sized 


Mo-sized 


Nodes with 6 GB 
of memory 


4.0- 10"^ 


6.0-10® 


1.2-10® 


9.9-10' 


4.0-10' 


Nodes capable of 
7.2 GFLOP 


2.2-10® 


3.3-10® 


6.7-10'' 


5.6-10® 


2.2-10® 



The bandwidth required to run a network that simulates the mouse or rat cortex is 
available in of-the-shelf Ethernet communication hardware. With a high-speed 
network that is found in today’s supercomputers it would even be possible to simulate 
an ANN of comparable size to the human cortex. 

The peak performance of a modern processor is a few GFLOPs. This means that 32 
modern processors, capable of 5 GFLOP each, would be required to run an ANN 
comparable to the mouse cortex in one hundredth of real time. In order to run a 
structure comparable to the human cortex we still lack 3 orders of magnitude in 
computing power and 5 orders of magnitude if we want to run it in real-time. 
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We have simulated ANNs with up to 5' 10® continuous valued 32-bit connections 
[47]. The effective performance is about 6' 10® OPS when running on 60 nodes in a 
Linux-cluster, which is only 2% of the peak performance of the processors. The 
program is written in Java and the performance is largely limited by the overhead in 
the Java VM. These simulation results implicate that running an ANN with an almost 
equivalent number of connections to that of a mouse cortex is possible, but in one 
thousands of real-time. A speed-up of an order of magnitude is probably possible if 
the code is implemented in C. 

4.1 Hardware 

The limiting factor in our model is the computational requirement. The memory 
requirement in a hypercolumn module is low in comparison to the computational 
requirement. Also the communication requirement between the hypercolumns is 
relatively low. The fact that the computations are the limiting factor means that our 
model would gain enormously from a hardware implementation providing parallelism 
on the nodes as well as in the cluster. 

In our model the size and number of connections per hypercolumn is constant from 
mouse to man. This makes it possible to specify the exact requirements for a 
hypercolumn computational module. The hypercolumn is composed of 160 
minicolumns and these have a total number of 3- 10^ incoming connections. The 
memory needed for the hypercolumn module is 200 MB and the number of operations 
is 13 GOP. The memory bandwidth required in the module is 20 GB/s. 

5 Discussion 

Taking the view of the cortical minicolumn as the basic functional unit in cortex of 
mammals has allowed us to map cortical networks onto an ANN in an efficient way. 
We argue that the units of the model we propose have an appropriate level of 
complexity and that their functionality should correspond well to that of the 
minicolumns. 

Observing the hypercolumnar structure in nature and using this in the model has 
enabled a very communication efficient implementation. The ANN model we suggest 
does not suffer from a communication bottleneck between its units as many other 
ANN algorithms do. The local Hebbian learning rule gives the model good scaling 
properties. The local learning rule together with the communication efficient modular 
hypercolumn structure allows for coarse grained parallelism. 

We assume that the basic size and shape of the modular structures are the same in 
all mammals i.e. the mini- and hypercolumns. Since the number of synapses per 
neuron is roughly constant from mouse to man the number of connections to each 
hypercolumn is also constant from mouse to man. In mouse the connectivity between 
minicolumns is almost full while in humans the connectivity is only one per thousand 
in our model. If we think of cortex as an attractor memory the storage capacity scales 
proportional to the number of units in our model. This is in contrast with models that 
preserve the connectivity density where the number of memories scales proportional 
to the square of the units i.e. number of connections. In raw numbers this means that 
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in our model the memory capacity of humans compared with mouse is 10^ orders 
higher and in the connectivity preserving model the memory capacity is 10^ orders 
higher. Humans live approximately 40 times longer than mice, it thus seems more 
reasonable that our memory capacity in terms of entries is increased 10^, and not 10^, 
orders of magnitude. 

The constant number of synapses to each neuron means that the metabolic 
demands on the neurons is constant from mouse to man. This is physiologically 
reasonable since it implicates that the maximum level of metabolism is reached. 
Another argument for a constant number of synapses per neuron is that the strength of 
PSP (post synaptic potentials) has the same level both in mouse and man. If it were 
that the neurons in man had more synapses they would need to be weaker or they 
would receive a stronger input, which in turn would make the PSP larger. 

The hypothesis that there exists a columnar structure in cortex is strong for 
humans, monkeys and cats. But for smaller mammals such as rat and mouse this 
hypothesis is rather vague. Here, we have assumed that the columnar structure also 
applies to these smaller mammals and provided functional, physiological, and 
evolutionary arguments for this. Evolution usually keeps good solutions. The cortex 
has been a big advantage for mammals. It would be strange if not the same 
computational structures, principles, and solutions were used in the cortex of different 
mammalian species. 

A scaling theory that is based on the idea of conserving the level of connectivity in 
the cortex without altering the size of the minicolumns cannot explain the fact that the 
number of synapses per neuron remains constant. Another problem with such a model 
is that it would require much more myelinated fibers than what is seen in the large 
brains of primates. 

A compromise is to let the hypercolumns vary in size. This means that one would 
trade the number of units against the level of connectivity. In a scaling model 
proposed by Braitenberger [48] the size of the modules is the square root of the units. 
Applying such an idea that the size and number of hypercolumns can vary would 
make the difference in connectivity density between mouse and man less than in the 
current model. 

The calculations show that we are capable today of attaining the goal of simulating 
an ANN of comparable size to the mouse cortex but not in real-time. The 
computational resources needed to run an ANN of comparable size to the human 
cortex is at least 3 orders of magnitude more than what we can put up today. 

A conclusion from our estimations is that the limiting factor is the computational 
resources. It is therefore necessary to make a hardware implementation of the ANN in 
order to fulfill the vision of simulating the human cortex. Structures similar to the 
hypercolumn have already been implemented in hardware [49] . 

Hardware implementations of ANNs often focus on speed of operation. A single 
iteration in these networks can be as fast as 10 ® s [50]. The size of these ANNs is 
usually small. The neural networks in nature are usually very large but much slower. 
The networks in nature have an iteration time of about 10'®- 10’^ s. Other things that 
differ between artificial and biological neural networks is that most hardware 
implemented ANNs lack the ability to learn and many implementations has very 
simple units with binary weights and activity. 
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The electronic circuitry in today’s computers is getting smaller and smaller. This 
means that the error rates in the electronic memories and processors is getting higher 
and higher. Shielding the applications that run on this electronic hardware from errors 
thus is getting more and more expensive. A solution to this problem is to use error 
redundant algorithms such as ANNs [51]. This idea also applies to the control systems 
needed in the future nano-mechanic gadgets that are on today’s drawing boards. 
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Appendix: A 

Human 

Cortex Area (cm"): 2200 [24, 25], 2400 [9, 10], 
2600 [18] 

Neurons in Cortex: M0“ [16], 2'10“ [14, 32], 
3-10"’ [25] 

Synapses in Cortex: 6'10'" [16], 1.5'10'‘' [13], 
2.4-10'''[14] 

Average thickness of Cortex (mm): 2.8 [9, 10] 

Macaque (mostly rhesus) 

Cortex Area (cm"): 250 [9] 

Neuron Density (mm ■"): 5'10‘' [14, 16] (primates), 
A\(f (whole brain) [52], 4-6-10'' [53], 7-14- lO'' 
[54] 



Synapse Density (mm"): 2- 10* [54] 

Average Thickness of Cortex (mm): 2.2 [9, 10] 

Cat 

Cortex Area (cm"): 83 [9, 10], 90 [55] 

Neuron Density (mm'"): 5-10'' (visual cortex) [14, 
52], 3' 10* (motor cortex) [12] 

Synapse Density (mm'"): 3'10* [12, 14] 

Average Thickness of Cortex (mm): 1.8 [9, 10] 

Rat 

Cortex Area (cm"): 4-5 [25], 6 [10], 6.4 [9] 
Neurons in Cortex: 6.5' lO" [25] 
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Neuron Density (mm 4.4T0'' (whole brain) 
[52], 4.8-7.7T0'' mean 6.5-10'' [56] 

Average Thickness of Cortex (mm): 1.2 [6], 1.4- 
1.9 mean 1.7 [56] 

Monse 

Cortex Area (cm^): 2-4 [57] 



Neurons in Cortex: 1.6T0’ [8], 4T0’ [58] 
Synapses in Cortex: order of 10“ [8] 

Nenron Density (mm'^): 9.2T0'' [8, 14], 8.3T0'' 
[52] 

Synapse Density (mm'^): 7.2T0* [8, 14] 

Average Thickness of Cortex (mm): 0.8 [6], 0.8- 
0.9 [8] 
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Abstract. We present an emotion- based hierarchical reinforcement 
learning (HRL) algorithm for environments with multiple sources of re- 
ward. The architecture of the system is inspired by the neurobiology of 
the brain and particularly those areas responsible for emotions, decision 
making and behaviour execution, being the amygdala, the orbito-frontal 
cortex and the basal ganglia respectively. The learning problem is decom- 
posed according to sources of reward. A reward source serves as a goal for 
a given subtask. Each subtask is assigned an artihcial emotion indication 
(AEI) which predicts the reward component associated with the subtask. 
The AEIs are learned along with the top-level policy simultaneously and 
used to interrupt subtask execution when the AEIs change significantly. 
The algorithm is tested in a simulated gridworld which has two sources 
of reward and is partially observable. Experiments are performed com- 
paring the emotion based algorithm with other HRL algorithms under 
the same learning conditions. The use of the biologically inspired archi- 
tecture signihcantly accelerates the learning process and achieves higher 
long term reward compared to a human designed policy and a restricted 
form of the MAXQ algorithm. 



1 Introduction 

By decomposing a reinforcement learning problem into a hierarchy of subtasks 
the learning process can be accelerated and the subtask policies can be shared 
across hierarchical levels [1,2]. However, many HRL algorithms depend heavily 
on human knowledge and experience to provide the structure of the hierarchy 
and to identify subtask modules. The hierarchical structure and subtask modules 
are usually designed for a specific problem and the agent has to irrevocably com- 
mit to them during both the learning and application processes. Some research 
such as [3] is being done to identify the appropriate hierarchical structure as part 
of the learning process. In most HRL algorithms reward signals are treated as if 
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arising from a single source. If a reward source appears or is removed, policies 
need to undergo further learning iterations to adapt to the change [4]. Rather 
we propose that HRL systems should recognise changes in the reward structure 
explicitly and only be required to re-learn when new sources of reward are en- 
countered. Animals solve this problem very well. If an animal learns three tasks 
such as eating, drinking and escaping separately in distinct environments (each 
environment only having one corresponding goal) . The when the animal is placed 
in a new environment with all three of the reward sources present, it is almost 
immediately able to correctly associate the appropriate behaviour with the cor- 
responding source of reward. Further, the priorities of these goals may vary with 
time according to internal states, such as energy levels. Emotion is known to play 
an important role in flexible and adaptive behaviour [5]. Emotion is motivating 
and prioritising. Thus, by weighting different objectives and situations emotions 
provide animals with an important capability to deal with multiple goals in an 
uncertain environment [5]. Recently, considerable biological evidence has shown 
that brain areas such as the amygdala, orbitofrontal cortex (OFC), basal ganglia 
(BG) and dopamine neurons are involved in emotional/motivational control of 
goal-directed behaviour [5,6]. 

Inspired by neurobiology, Zhou and Coggins [7] recently proposed an emotion- 
based hierarchical reinforcement learning algorithm. This paper extends this 
work as follows. Compared to [7] the experiments have been modified to pro- 
vide all algorithms with identical state abstractions, thereby providing additional 
evidence for the advantages of the proposed approach and enabling a more de- 
tailed performance comparison and analysis. The paper is organized as follows. 
Section 2 briefly outlines system-level functions of the brain areas involved in 
emotion processing and responses. Section 3 presents our biologically inspired 
model. A more detailed justification is found in [7]. In section 4 the implemen- 
tation of the models and experimental results are presented. Section 5 is the 
conclusion. 



2 Biological Background and Guidance 

The amygdala is a subcortical region in the front side of the brain, recognized 
to be at the center of sensory /emotional associations including fear, danger, 
and satisfaction, as well as other motivational and emotional sensations [8]. The 
basal lateral nucleus of the amygdala (BLA) receives highly processed informa- 
tion mainly from cortical sensory processing areas and is believed to be involved 
in evaluation of the affective significance of sensory cues. The information is pro- 
cessed through intra-amygdala connections and transferred toward the central 
nucleus of the amygdala (CeA), which is the interface with the motor system 
and is involved in instrumental responses. Working closely with the amygdala 
the orbitofrontal cortex (OFC) is known to be involved in decision-making and 
adaptive response selection [5]. The BLA is hypothesized to encode the emo- 
tional significance of cues and the OFC uses this information in the selection 
of an appropriate behavioural strategy [5]. The BG is known to be implicated 
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in reinforcement learning-based sensorimotor control [9]. Thus through frontal- 
striatal projections the BG provides one of the important information processing 
and behaviour execution systems for emotion and motivation processing [5] . The 
middle brain dopamine neurons have divergent projections to the OFC, BG and 
amygdala, which appear to report salient events in the form of a prediction 
error [6]. 



3 The Biologically Inspired Model 

Guided by the aforementioned neurobiology of the related brain areas, an 
emotion-based hierarchical reinforcement learning algorithm for multiple sources 
of reward problems was proposed [7]. Figure 1 shows the schematic diagram of 
the proposed architecture. Raw sensory inputs are relayed to the cortex through 




Fig. 1. Left: The emotion-based HRL architecture of the amygdala, OFC, BG and 
Dopamine neurons. Right: The schematic diagram of AEI learning. 



the thalamus (the sensory relay center in the brain). High level features are ex- 
tracted in the cortical areas and projected to the amygdala and the OFG. The 
OFG, amygdala, BG and dopamine neurons constitute a multiple-level hierar- 
chical reinforcement learning system. The OFG is at the top level that makes the 
abstract decision of choosing among component behaviour alternatives to max- 
imize overall reward. The BG is at the bottom level that executes the chosen 
behaviour. The architecture is focused on multitask problems defined by a set 
of reward components. The entire task is decomposed according to the sources 
of reward. Each reward component serves as a subgoal that defines a subtask. 
Each subtask pursues the corresponding subgoal. We assume each source of re- 
ward can be distinguished by the value of reward components or other properties 
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(e.g. color, tone). In general this classification must be learned. The objective 
of a subtask is to maximize the corresponding reward component. In the end, 
every subtask contributes towards maximizing the cumulative reward for the 
entire task. We assign an artificial emotion indication (AEI) to each subtask. 
AEIs approximate emotion representations in the BLA and are defined as the 
predictions of the reward components corresponding to the subtasks. The BLA 
learns and maintains the association of high-level sensory features with AEIs. 
AEIs can be learned simultaneously along with the OFC using classical condi- 
tioning by the method of temporal differences (TD) [10]. We modify the TD 
version of classical conditioning so that it can be used with state abstractions 
where state transitions take a variable amount of time. Figure 1 (right) shows 
the AEI learning process. The BLA maps sensory features into AEIs every time 
step. The BLA memory is updated according to the AEI prediction error pro- 
vided by dopamine neurons. Let Xc and xi^ be the current and next feature 
states corresponding to a given AEI respectively, where n denotes the number 
of primitive time steps taken from state Xc to xt^, the subscript c indexes over 
emotion categories directly corresponding to subtasks and the associated single 
source of reward for that subtask. Then the AEI value, Vc, is expressed as the 
expected sum of discounted future reward component values, given by 

Vc(Xc) = E{rc,t+1 + 7c^c,t+2 + ■ ■ ■ + 7 " ^?'c,t+n + 'Jc’>'c,t+n+l + ’ ’ ’); (1) 

where Vc(xc) is the AEI value for feature state Xc at time t, 7 c is a discount 
factor that accounts for the decreasing AEI value of rewards further into the 
future, rc,t is the pay-off received for reward component c at time t and E is the 
expectation operator. 

The AEI prediction value, given by Equation (1), can be divided into two 
parts, one of which is the discounted cumulative pay-off for the reward com- 
ponent over the period of transition from Xc to xtc, and the other one is the 
new expected sum of discounted future reward component values after the state 
transition. In fact, the second part is the AEI prediction value of the next state 
xtc at time {t + n), given by 



Vc{xtc) — E{r c,t+n+l + lc'l’c,t+n+2 + •••)■ 



( 2 ) 



Thus, let Vc,t denote the cumulative pay-off for reward components C received 
during the transition from Xc to xtc, given by 

n 

Vc,t = ^ll~'^rc,t+^■ (3) 

i=l 

The Equation (1) can be rewritten as 



Vc(a;c) = Vc,t + 7c Ec(a:'c)- (4) 

Hence, the expected AEI value at each state is equal to the cumulative pay- 
off for the reward component received at the next state plus the next expected 
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AEI value discounted by 7 ”. The discrepancy between the expected and the 
predicted values is the prediction error, 6c,t^ which is given by 

5c, t = Vc,t + - Vc,ti.Xc,t)- (5) 

Using an eligibility trace the AEI value is updated by 

bc,t+n(^c) ■ — Vc,t{Xc) ; (6) 



where Oc is the learning rate, and ec(xc) is an eligibility trace that underlies the 
retroactive effects of reward on the feature space. The eligibility trace is updated 
according to 



/ 7c '^cCc(^c) Xq yf Xc,t 

\ 7”Acec(xc,t) + 1 if Xc = Xc,t 



( 7 ) 



where Ac is the decay rate of the eligibility trace, and Xc is all states for a AEI, 
and Xc,t the current state. 

These artificial emotion indications (AEIs) constitute a subtask based de- 
composition of the value function. AEIs can be thought of as the estimates 
of the corresponding subtask value. AEIs are independent of each other. The 
artificial emotion indicators aggregate sensory inputs into subtask specific vari- 
ables, which provide the agent selective perception of the environment in terms 
of the agent’s concerns (reward or punishment). Therefore, AEIs constitute the 
artificial emotion state of the agent and are proposed as state abstractions for 
the OFC and CeA. Further, being the estimate of subtask values the artificial 
emotion state serves as an interruption triggering mechanism for the currently 
executing sub-policy in the OFC. When the artificial emotion state changes 
significantly, the OFC re-evaluates the current sub-policy to decide whether to 
continue it or select another one. We argue that the OFC, the CeA and dopamine 
neurons work together as a temporal difference reinforcement learning system. 
The OFC acts as an actor that maps state space to sub-policy space. The CeA 
is a critic that provides the dopamine neurons with the reinforcement prediction 
P\{f). Using the reinforcement prediction and primary reinforcement received, 
r{t), the dopamine neurons calculate effective reinforcement (temporal error). 
Based on the effective reinforcement the synaptic strengths of the CeA and the 
OFC are updated. This is a Semi-Markov Decision Process (SMDP) in which 
the actions take a variable amount of time to complete. The OFC policy can be 
learned using the SMDP version of one-step Q-learning [2,11]. After observing a 
state s, executing a subtask o, terminating the subtask, and observing the next 
state si, the action value function of the OFC for arbitrating subtask policies is 
updated according to 



Qt{s,o) := Qt-i{s,o) +'q[r + ')Q max Q(s/, o/) - Q(s, o)j, ( 8 ) 

OieOa, 



where k is the number of time steps elapsing during the subtask o, 7 g is the 
discount factor and r is the cumulative reward received over the period of subtask 
execution. In the BG, we model subtasks as individual reinforcement learning 
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Fig. 2. The simulated gridworld. G denotes a gateway, X stands for ’’exit”, and 
A,B,C,D are possible home for the agent. 



systems. Similarly, the sensorimotor striatum and limbic striatum of the BG can 
be modeled as an actor and a critic respectively [9] . The limbic striatum provides 
reinforcement prediction, P 2 {t), to dopamine neurons that calculate the effective 
reinforcement. The synaptic strengths of both the sensorimotor and the limbic 
striatum are updated using the effective reinforcement. In this paper we assume 
that subtasks already have been learned in the BG. 

4 Experiments 

To test the proposed biologically inspired HRL architecture, we developed a 
simulated gridworld environment with multiple sources of reward corresponding 
to multiple simultaneous behavioural goals. We show how the artificial emotion 
mechanism plays a role in the combination of existing policies without needing 
to modify them. 



4.1 Experimental Environment 

There are two rooms in a gridworld that are connected by a gateway as shown 
in Fig. 2. An agent is supposed to live in it, searching for food, eating food and 
escaping from predators. The agent has 6 actions: moving east, south, west and 
north, chewing food and doing nothing. If the agent tries to move through a wall 
then the action has no effect. The cost of each action step is a reward of -0.1. 
To make the experiment realistic we introduce noise into the agents actions and 
those of the predator. The agent has 80% chance of taking the intended action, 
and a 20% chance of a random action. The agent can only access information 
about the current room. On the gateway G, the agent receives an indication and 
all sensory information switches to the other room. The sensory information 
available to the agent within each room is the agent’s own co-ordinates, the 
food and predator co-ordinates if they exist and the co-ordinates of the homes. 
Food is randomly located in both rooms. There is only one piece of food at any 
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given time. The agent spends 10 time steps to chew a piece of food to get a 
reward of 10. After the food is eaten another piece of food randomly appears 
somewhere. The agent has a home that is located in one of A,B, C or D positions. 
A predator appears periodically with a minimum period of 15 steps. The period 
may be increased by one unit with 20% probability on each step. The predator 
is always in the same room as the food and originates from the side with 80% 
probability in the same row as the food, and 20% probability in random row. 
The predator can navigate anywhere in the room except the agent’s home. The 
predator has four actions to move one step east, south, west, north each time 
step with the following probabilities 

Prob{h)=P,/Y,P, , ( 9 ) 

where i = 1,2, 3, 4, denote east, south, west and north respectively. Pi = 
exp{Di/T), is the tendency strength of moving in the zth direction; T is a tem- 
perature; Di = Dm cos 9i, is the projection of distance between the agent and the 
predator on the corresponding direction; Dm is the Manhattan distance between 
the agent and the predator, and 9 is the angle between the direction of motion 
and the direction of the predator to the agent. When the predator captures the 
agent (their positions coincide) the agent dies and receives a reward of -100. 
The asymmetrical distibution of reward magnitudes was chosen to prioritise one 
goal over the other when both reward sources are present and thereby reward 
behaviour that adapts to changing priorities. 

Suppose that there are two existing separate subtask hierarchical poli- 
cies: eating and escaping (see Fig. 2 right). They are learned separately using 
MAXQ [1] with the other source of reward and corresponding state variables 
removed. The experimental task is how to combine these two separate existing 
policies by adding a top node for arbitrating between these subtasks to maximize 
the entire reward received from the environment. 



4.2 Subtask Combination 

First we try to simply add a node above the two tasks. We call this the simple 
combination algorithm (SCA). The top level policy is learned using SMDP Q- 
learning according to Equation 8. State abstractions are used as shown in Table 1. 
The learning rate is set at 0.2 and yg = 0.9. We use Boltzmann exploration with 
an initial temperature T = 50 and a cooling rate /i = 0.995. The temperature 
is updated by T := pT after subtask termination. A description of how these 
learning paramater settings were chosen is described in Section 4.4. The learning 
performance is shown in Fig. 4. The learning converges to a reward of -0.1 that 
the agent can receive by doing nothing. This is because there is no interruption 
during the execution of a subtask. For example, selection of the eating task runs 
the risk of being captured by a predator during the eating process. Therefore, 
eating is significantly discouraged in most of the states. 

To improve the performance we modify the subtask policies to allow inter- 
ruption during their execution. We refer to this as the modified combination 
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Fig. 3. The decomposition of eating and escaping tasks with their state abstractions. 
Note, ’exit’ refers to location X. ’’food-held” is active when the food coordinate and 
the agent coordinate coinside. 



Table 1. State abstractions used in SC A, MCA and RMAXQ 



Dimension 


Size 


Value 


predator-distance 


10 


predator-out-of-sight, Manhattan-distance to a predator 


food-distance-amount 


20 


(food-out-of-sight, Manhattan-distance to food) 
-Ffood amount in numerical units {0,1,. ..,10} 


home-distance 


3 


far(> 6), medium(3-6), close(< 3) 


food-consumed 


2 


yes/no 


agent-captured 


2 


yes/no 



algorithm (MCA). In the hierarchy of the eating policy all sub-policies are al- 
lowed to terminate when a predator can be seen by the agent. Similarly all the 
subtask policies of the hierarchy of the escaping policy are modified to terminate 
when a predator is out of the agent‘s sight. The performance of MCA is signifi- 
cantly better while using the same state abstractions and learning parameters as 
those of SCA (see Fig. 4). However, there are two major disadvantages with the 
modification of subtask policy hierarchies. First of all, the modification of every 
subtask policy is a very difficult job especially for a multiple level hierarchy of 
subtasks in a complicated and uncertain environment. Another problem is that 
the termination conditions may be subject to change in different situations. This 
approach is inflexible since it relies on prior knowledge. 

4.3 The Emotion Based Algorithm 

To solve this problem we use the emotion-based hierarchical reinforcement learn- 
ing architecture described in Section 3. We assume that subtask policies have 
already been learned in the BG. The subtask hierarchies are kept unchanged 
and an artificial emotion indication (AEI) is assigned to each subtask. We call 
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Fig. 4. Mean cumulative rewards averaged over 10 runs. SCA - simple combination, 
MCA - modified combination, RMAXQ - restricted MAXQ. EC A - emotion-based 
combination. The error bars are standard errors with 95% confidence coefficients. 



this the emotion-based combination algorithm (EGA) . AEIs are learned through 
experience with the top-level policy simultaneously according to Equations (5) 
and (6). The features on which the eating AEI depends are the state absrac- 
tions shown in Table 2. The AEIs are initialized to small arbitrary values. In 



Table 2. State abstractions used in AEIs 



AEIs 


Feature value 


Size 


Value 


eating 


food-distance- amount 


20 


(food-out-of-sight, Manhattan-distance to food) 
-t food amount in numerical units {0,1,.. .,10} 


escaping 


predator-distance 


10 


predator-out-of-sight, Manhattan-distance 
to a predator 



the experiment both eating and escaping AEIs use the same learning settings: 
ac = 0.25, 7c = 0.9, Ac = 0.7. Figure 5 shows the learning results of the 
eating and escaping AEIs when learning of the top-level policy finished. The 
magnitudes of AEIs are at their highest at a feature value of 1. Then they drop 
exponentially as the distance increases. 

When the artificial emotion state (the set of AEIs) changes significantly the 
subtask is interrupted and the top-level will re-evaluate to continue it or select 
another one. Significant changes in the artificial emotion state is judged using 
Euclidean distance. In the experiment the vector of AEIs are normalized to unit 
vectors and the threshold for significant change in the artificial emotion state is 
set at 0.5. The state abstractions for EGA are shown in Table 3. The learning 
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Fig. 5. AEI learning results. Feature value is state abstractions: distance-predator for 
the Escaping AEI and distance-food-amount for the Eating AEI. 



Table 3. State abstractions used in EGA 



Dimension 


Size 


Value 


eating AEI 


6 


uniformly quantized to 6 states over the range [-100, 0] 


escaping AEI 


6 


uniformly quantized to 6 states over the range [0, 10] 


home-distance 


3 


far(> 6), medium(3-6), close(< 3) 


food-consumed 


2 


yes/ no 


agent-captured 


2 


yes/ no 



parameter settings for EGA are rj = 0.2, jq = 0.9, T = 50, and the cooling 
rate is 0.98 (see Section 4.4 for an explanation of these choices). For the sake 
of comparison the actual rewards received from the environment were recorded 
during the learning process in the experiment. Their learning performance is 
shown in Fig. 4. We can see that EGA converges much faster compared to MG A. 
Its mean cumulative reward reaches the mean reward of the human designed 
policy (see Table 4) at approximately 160000 steps. 

4.4 Comparison of Algorithms 

To test the performance of the proposed emotion-based algorithms, comparison 
was made to several other algorithms. Besides SGA and MG A described in Sec- 
tion 4.2 we compared EGA to a human designed policy (HUM) and a restricted 
MAXQ algorithm (RMAXQ). HUM uses human reasoning. The pseudo-code for 
the human policy is shown in Table 4. RMAXQ is a restricted MAXQ algorithm 
in which the top-level policy only is learned using MAXQ [1]. RMAXQ uses 
the same state abstractions as SGA and MGA (see Table 1). Also RMAXQ uses 
the modified subtasks of MGA. To improve the performance of the hierarchical 
policy we use non-hierarchical execution of the hierarchical policy [1,2]. That is, 
a subtask is interrupted every single primitive step and the top-level policy se- 
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Table 4. Pseudo-code for the human designed policy. 



1. s •(— current observation; 

2. if no predator, go to eat; 

3. elseif in gateway, do nothing; 

4. elseif hold food and food-amount is less than predator-distance-|-2, 
continue chewing; 

5. elseif home in sight, flee home; 

6. else flee to room-exit; 

7. go to 1; 



lects the most highly valued subtask to execute. In the experiment we found the 
learning would not converge. This is because at the early stages of learning the 
correct value function has not been constructed yet and greedy execution of sub- 
tasks leads to frequent interruption of this process. Using Dietterich’s method [1] 
we set a variable interval for subtask execution with an initial value of 50. The 
interval is decreased to 1 as learning progresses. In the experiments we carefully 
tuned learning parameters for every algorithm so that each algorithm can be 
compared at it’s best performance as follows. We tested learning rates ranging 
from 0.1 to 0.35 and found that its effect on learning convergence speed is less 
than 5%. Therefore we chose a learning rate of 0.2 for all algorithms. An ini- 
tial temperature of Boltzmann exploration of 50 is used for all the algorithms. 
Since cooling rates have a greater influence on the learning convergence, 10 runs 
were performed with cooling rates ranging from 0.9 to 0.999 and the cooling 
rate achieving the fastest convergence was chosen for each algorithm. The cool- 
ing rate for RMAXQ is 0.998. Mean cumulative reward was recorded during the 
learning process. Figure 4 shows the learning performance averaged over 10 runs. 
The detail of Fig. 4 is illustrated in Fig. 6. The convergence speed of EGA is 
much faster than other algorithms. It is worth noting that RMAXQ converges 
slower than MCA until about 150000 steps. Then it overtakes MCA and reaches 
the mean reward of the HUM policy at about 500000 steps. 

Mean cumulative reward (MCR) over the last 10000 steps was recorded for 
each algorithm to make fair comparisons. Also we take the MCR of MCA and 
RAIAXQ during 190000 to 200000 being the same number of learning steps as 
EGA. The MCR of HUM is recorded for 10000 steps averaged over 10 runs. 
These results are shown in Fig. 7. After 200000 learning steps EGA, MCA amd 
RMAXQ achieve mean MCRs of 0.210, 0.169 and 0.189 respectively which are 
all better than that of HUM. At 500000 steps MCA reaches 0.203 and RMAXQ 
achieves 0.211 which is close to the MCR of EGA at 200000 steps. We suggest 
two reasons why EGA learns faster than MCA and RMAXQ. Firstly, artificial 
emotion indicators aggregate sensory inputs into task specific variables, AEIs, 
which serve as a selective perception of the environment in terms of the agent’s 
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Fig. 6. Detailed learning performance of Figure 4. The cumulative rewards are averaged 
over 10 runs. SCA - simple combination, MCA - modified combination, RMAXQ - 
restricted MAXQ. EGA - emotion-based combination. The error bars are standard 
errors with 95% confidence coefficients. 



behavioural priorities. AEIs provide compact state abstractions for the top-level 
policy. The second reason is that EGA terminates the subtask based on AEIs, 
which are the estimation of the subtask’s value functions based on a source of 
reward decomposition. Thus, the subtask is terminated when necessary accord- 
ing to subtask value considerations. Whereas MCA terminates the subtask by 
human-designed criteria for which it is not possible to include all necessary ter- 
mination predicates. RMAXQ terminates the subtask at every single primitive 
step. When the subtask value function has not been well determined, frequent 
interruption may slow down the learning process. EGA achieves a balance in 
between MCA and RMAXQ with regard to subtask. 

4.5 Discussion 

The above experimental results indicate that the simple combination of subtasks 
leads to poor selection of subtask policies. In order to achieve better performance, 
modification of subtask definitions is necessary. The modification of a subtask 
is difficult if the task and environment are very complicated. For example, it is 
almost impossible for the human programmer to precisely include all necessary 
termination conditions for a complex task. Therefore, the modified combination 
algorithm may miss some important events, limiting its long-term learning per- 
formance. Greedy non-hierarchical execution of a hierarchical policy can achieve 
an even better top-level policy than the modified combination algorithm. In non- 
hierarchical execution programming, for every single primitive step the subtask 
is forced to terminate and the program returns to the upper level policy that 
called it. Thus, the program returns to the top-level all the way from the bottom 
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Fig. 7. Comparison of mean and standard error of mean cumulative reward over 10 
runs, tl denotes sampling from 190000 to 200000, and t2 from 490000 to 500000. SCA 
is presented by a line because the variance is very small. The error bars are standard 
errors with 95% confidence coefficients. 
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level. Then, the program selects the most highly valued subtask to execute from 
the top-level down to the bottom level. Therefore, if the subtask value func- 
tions of all levels are well determined, the greedy non-hierarchical execution of 
a hierarchical policy can guarantee an optimal solution. However, if the state- 
subtask values are far from the correct values, for instance and most importantly, 
during early learning stages or when the environment has changed, the greedy 
non-hierarchical execution of a hierarchical policy causes the executing subtask 
to be frequently interrupted. The frequent interruption may then slow down 
the learning process. This is usually the case for many applications, e.g. a real 
autonomous robot in an uncertain environment. Greedy non-hierarchical execu- 
tion of a poorly-determined hierarchical policy may lead to a catastrophe for the 
agent. For animals, emotion is a critical built-in mechanism developed through 
generation-to-generation evolution to help them survive in ever-changing com- 
plex environments. Inspired by this idea, in the emotion-based algorithm, the 
artificial emotion mechanism aggregates the sensory information into artificial 
emotion indications (AEIs) that reflect the agent’s reward related priorities. AEIs 
can also be thought of as an approximation of subtask value functions decom- 
posed according to reward sources, which provide a selective perception of the 
environment, paying particular attention to a significant change in the environ- 
ment concerning a specific task. The AEIs are independent of each other. This 
allows the program to more easily generalize over sensory state space. Another 
disadvantage of the greedy non-hierarchical execution of a hierarchical policy is 
that the program is required to terminate subtasks every single primitive step. 
This minimum subtask execution time is then common to all subtasks. How- 
ever, some subtask values may be learned faster than others. This may then 
also contribute to slower convergence. Furthermore, if the hierarchical policy is 
complicated, this approach is computationally expensive. Whereas, the emotion- 
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based algorithm uses AEIs as a transition triggering mechanism. That is, only 
change in AEIs is used to terminate subtasks. AEIs decompose the top-level 
value function according to the sources of reward. Subtask values and termina- 
tion conditions are learned together in co-operation with each other. Thus, AEIs 
can exploit factors of reward to bias the learning of subtask values by terminat- 
ing the specific subtask at the appropriate time, allowing the program to keep 
a balance between unnecessary interruption and missing an important event. 
From the perspective of programming the greedy non-hierarchical execution of 
a hierarchical policy is similar to a polling technique, while the emotion-based 
algorithm is similar to interrupt-driven programming because its subtask is only 
interrupted when the AEIs change significantly. Therefore we argue that the in- 
terruption mechanism in the AEI based algorithm aids learning convergence by 
ensuring that subtasks are not interrupted unnecessarily often. 

By using an artificial emotion mechanism multiple hierarchical policies can be 
easily combined and arbitrated without modification. Emotional variables can be 
learned through experience simultaneously with the top-level policy. Therefore, 
the system based on the emotion-based algorithm is flexible and autonomous. 
Furthermore, the artificial emotion mechanism allows existing policies to be re- 
used in new contexts. For example, suppose we assume an agent has learned 
a policy to arbitrate two subtasks: eat an edible object and avoid an aversive 
object. If the agent sees a novel object and classifies it to be edible or aversive, 
the policy hierarchy can stay unchanged. If the reward corresponding to the 
object changes, e.g. the perceived object changes from edible to aversive, then 
the emotional association is reversed while the policy hierarchy is unchanged. 
Therefore, to this extent, the proposed method possesses the capability to rapidly 
adapt to the environment. 



4.6 Related Work 

Sutton, Precup, Singh and Ravindran [2] illustrated that performance can be 
improved if subtasks are allowed to interrupt before their termination states are 
reached. Similarly Dietterich used greedy non-hierarchical execution of a hierar- 
chical policy to improve performance [1]. The greedy execution of a hierarchical 
policy may cause problems at the early stages of learning since the value func- 
tions have not been properly developed yet [1]. Gadando and Hallam [12] used an 
emotion-based event-detection mechanism to decide termination of behaviours. 
In their work behaviour transition is triggered by the detection of significant 
change in emotions. Emotions are analytically calculated by the programmer 
based on sensory inputs, while in our algorithm artificial emotion indications 
are learned simultaneously along with the top-level policy. Shelton [4] addressed 
multiple source of reward problems. By keeping track of the sources of the re- 
wards he derived an algorithm which immediately adapts to the presence or 
absence of a reward. In this paper we used multiple reward sources to decom- 
pose tasks. 
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5 Conclusions 

An emotion-based hierarchical reinforcement learning system has been presented 
inspired by the neurobilogy of emotion processing in the brain. Comparison ex- 
periments with other conventional hierarchical reinforcement learning algorithms 
were conducted under the same conditions. The experimental results show that 
inclusion of the artificial emotion mechanism allows the reuse of subtask policies 
without any modification and the learning performance is significantly improved 
compared with the simple combination, modified combination, and restricted 
MAXQ algorithms. In particular our approach provides an alternative to greedy 
non-hierarchical execution of hierarchical policies during the early stages of learn- 
ing which ensures that subtasks are only interrupted when changes in subtask 
value are significant. In subsequent work the reward classification will be learned, 
with a view to adapting to previously unseen sources of reward. 
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Abstract. We present two ways in which dynamic self-assembly can be used to 
perform computation, via stochastic protein networks and self-assembling 
software. We describe our protein-emulating agent-based simulation infra- 
structure, which is used for both types of computations, and the few agent 
properties sufficient for dynamic self-assembly. Examples of protein-network- 
based computation and self-assembling software are presented. We describe 
some novel capabilities that are enabled by the inherently dynamic nature of the 
self-assembling executable code. 



1 Introduction 

Dynamic self-assembly (consisting of the energy-dissipating processes of building up, 
tearing down, and dynamically modifying structures or patterns) is a ubiquitous proc- 
ess in non-equilibrium physical and biological systems. [1] Protein networks carry 
out much of the molecular-scale directed transport, assembly, communication, and 
decision-making activity within and across cells, and do so via dynamic self- 
assembly. It has been suggested that protein networks play a computational role in 
single cells analogous to that of neural networks in multi-cellular organisms. [2] 
Biomolecular systems provide models for guiding the development of molecular- 
based computing and self-assembly technologies. For example, chemical systems [3- 
5] and DNA-based systems for computing [6,7] have been discussed. In this paper, 
we explore how dynamic self-assembly, such as is carried out by protein networks, 
can be used to perform computation, and ask: Can any arbitrary Turing machine be 
implemented? If so, what are the key properties required of the proteins? How reli- 
able would such Turing machinery be, and how would errors be corrected? 

A major motivation for exploring computation via protein networks is that biologi- 
cal systems are robust, dynamic, adaptable, and self-healing — all properties that are 
highly desirable for information technologies. By abstracting the key properties that 
allow protein networks to implement computation via dynamic self-assembly, we 
hope to achieve a new “self-assembling software” approach that provides robust, 
dynamic, adaptable, self-healing software. 
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We have identified a few crucial properties of proteins and their interactions that 
are sufficient to enable the processes of dynamic self-assembly and computation. (1) 
Proteins have tremendous selectivity of their binding sites, operating much like a lock 
and key. (2) Binding or unbinding a ligand at one of these sites can result in a con- 
formational change of another part of the protein. This conformational change can 
perform some sort of actuation, such as moving (e.g., in motor proteins) or catalyzing 
an assembly or disassembly reaction (e.g., in enzymes). (3) A conformational change 
can also expose (or hide) additional binding sites, which in turn can bind and cause a 
conformational change resulting in actuation, or exposing or hiding yet another 
binding site. 

We abstract these important self-assembly and computational properties of pro- 
teins into an “agent,” the fundamental building block of our self-assembling software. 
An agent can store data, perform some simple or complex computation, or both. 
Each agent has one or more binding sites (each labeled with a numeric key) that can 
bind only to complementary sites (property (1)). Sites are said to be complementary 
when they have keys with the same absolute value but opposite sign. Once bound, 
property (2) enables the agent to actuate (perform its computation). Property (3) 
enables it to then bind to another agent, to trigger it to execute next. 

A specific execution sequence or biological signaling pathway can be “wired” to- 
gether by including a set of agents with binding sites that drive them to execute se- 
quentially. For example, suppose each agent has a “trigger” site that activates it 
(causes it to execute some code) and a “done” site that is exposed when its task is 
complete. Suppose agent A’s done site is complementary to agent B’s trigger site, 
agent B’s done site is complementary to agent C’s trigger site, and agent C’s done site 
is complementary with agent D’s trigger site. Once A is triggered, then B will exe- 
cute, followed by C, followed by D. 

It is important to note that such an execution sequence or pathway is not hard- 
coded, but self-assembled. That is, the agents are just “dumped” into the simulation 
environment, and the execution order occurs as a natural consequence of the order in 
which binding and unbinding events occur. As a result, the self-assembling executa- 
ble code is inherently dynamic in nature. The structure of the executing code is as- 
sembling and disassembling all the while it is executing, with execution pathways that 
are driven dynamically by matching keys between agents. All that is required to 
change the execution pathway — “re-wire” what the code does, or turn code on or 
off — is a change of keys. 

This feature leads to innovative and powerful capabilities in software developed by 
this approach. For example, changes to existing self-assembled code (due to chang- 
ing user requirements, or to reuse an existing self-assembled software package for 
another application) can also be self-assembled, without having to modify the original 
source code or shut down the running program. This is achieved by adding new 
agents with keys such that they rewire the existing code, even while it is running. 
Another example is “situation detection,” a mechanism for “sensing” whenever cer- 
tain conditions or events occur by providing passive agents with empty binding sites. 
These binding sites correspond to the conditions of interest, and when all sites are 
bound, the sensing agent is activated to report or trigger a desired response. Situation 
detection is asynchronous. It is also passive, in that no repeated active polling by the 
agent itself is required to detect the events. This capability can also be used to inter- 
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rogate or monitor the code itself during runtime. For example, if a program is using 
up a tremendous amount of CPU without producing any output, a query (a special 
case of situation detection) can be constructed to determine what the program (which 
was perhaps written by another programmer) is doing right now. These and other 
unusual capabilities are a consequence of the protein-network-emulating approach of 
self-assembling software. Although we are very early in the development of this self- 
assembling software technology, we have already demonstrated that the approach is 
dynamic and adaptable. Current and future work will focus on the issues of robust- 
ness, evolutionary (autonomous) adaptability, and self-healing. 



2 Simulation Infrastructure 

Stochastic simulations are an effective tool for modeling the dynamics of small pro- 
tein networks, and have been used, for example, to understand protein network prop- 
erties responsible for the robust adaptivity of chemotaxis. [8] Because the properties 
of the proteins and our self-assembling software agents are, by design, so similar, we 
have built a common agent-based simulation infrastructure for use both in the sto- 
chastic simulations of protein networks and for the self-assembling software. In the 
protein network simulations, an agent represents a single protein or protein complex. 
In the self-assembling software, an agent stores data and/or performs an atomic com- 
putational operation (such as adding two numbers, solving an equation, or writing 
some output to a file). 

An agent is constructed from a sequence of parts. These parts are roughly analo- 
gous to protein domains, except that only those domains with binding sites are in- 
cluded. The detailed physics and chemistry of conformational changes is not mod- 
eled. Instead, we directly model the properties of the agent that matter for self- 
assembly and computation — the actuation and exposing/hiding of other binding sites. 
Each part has a binding site that can be bound to at most one other site at any time. 
Each site has a numeric key that can either be invalid (hiding the site, preventing it 
from binding), or that only allows binding with complementary sites. Thus, this 
binding is a selective process as in biological systems (property (1) of the Introduc- 
tion). Matching binding sites can be thought of as having a virtual attraction, since 
binding will readily occur between them when they become available (by becoming 
exposed or unbound from an existing ligand). 

Each binding site can have two types of events, binding and unbinding, and has an 
“event handler” associated with each event type. These event handlers are executable 
code, and implement properties (2) and/or (3) of the Introduction. For example, in a 
self-assembling software system, a binding event at site A could trigger the summa- 
tion of two numbers (property (2)) and also expose site B for binding (property (3)). 
A comparable example from the protein network simulations would be a kinase that, 
when bound to a substrate, phosphorylates the substrate, releases it, and then hides its 
own substrate-binding site until the kinase is activated again. All of the “action” of 
the agent, then, is coded in the event handlers. In other words, the stochastic binding 
or unbinding of these sites triggers the deterministic execution of code, whether that 
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code represents a physical process (for protein network simulations) or an informa- 
tion process (for self-assembling software). 

Initially, a population of agents is included in the simulation environment. The 
simulation infrastructure locates any exposed sites with complementary keys, and 
schedules binding events for these sites on an event queue, ordered by the scheduled 
event time. Any unmatched sites are placed on a free-site list to wait passively until a 
complementary site becomes available. The simulation proceeds by pulling the first 
event from the event queue, binding the designated sites to each other (essentially, 
setting the two sites’ pointers pointing to each other), and executes the two sites’ 
binding event handlers. During the execution of the event handlers, a number of 
things could happen, (a) Some physical actuation or a calculation could be per- 
formed. (b) A binding site could be exposed. If a site with a complementary key is 
found on the free-site list, a binding event is scheduled. If no complement is found, 
the site is placed on the free-site list, (c) A site could be hidden. If that site is associ- 
ated with a scheduled event, that event is canceled. If the site was on the free-site list, 
it is removed, (d) The key of a site could be changed. Corrections are made to a 
scheduled event, and/or a correction is made to the free-site list to reflect the new key. 
(e) An unbinding event could be scheduled. 

The simulation proceeds by pulling the next event from the queue, binding or un- 
binding the designated sites, according to the event type, and then executing the event 
handlers. (The same possibilities (a)-(e) could occur during the execution of an un- 
binding event handler.) This process continues until there are no more events on the 
event queue, or the maximum desired time is reached. The implementation details 
have been described elsewhere. [9] 

A specific execution sequence or biological signaling pathway can be “wired” to- 
gether by including a set of agents with keys that drive them to execute sequentially. 
The A — > B ^ C ^ D pathway described in the Introduction was wired together by 
assigning complementary keys to the done site of one agent and the trigger site of the 
next. We reemphasize here that such an execution sequence or pathway is not hard- 
coded, but self-assembled. The agents are just “dumped” into the simulation envi- 
ronment, and the execution order occurs as a natural consequence of the binding and 
unbinding events that are pulled from the event queue. 

A natural property of this approach is the self-assembly of concurrent non- 
deterministic execution pathways in parallel, or multi-threading execution paths. For 
example, the A — ^ B ^ C — > D pathway can be executed in parallel with a com- 
pletely different pathway Q ^ R ^ S ^ T, as long as the keys from one pathway do 
not match those of the other. To synchronize multiple threads (for example, if agent 
U can execute only after both D and T have completed execution), no special syn- 
chronizing code is required. The synchronizing agent (U) simply waits passively 
with its keys on the free-site list until triggered (when D’s and T’s done sites bind to 
U). 

“Encapsulants” effectively create local environments in which collections of free 
binding sites can interact. Encapsulants in our approach are meant to resemble bio- 
logical cell membranes that isolate their internal contents from interactions with ex- 
ternal structures. Thus, identical A — > B — ^ C D pathways could be executing in 
parallel in different encapsulants, without any interference, even though they have 
matching keys. Encapsulants can contain agents as well as other encapsulants (for 
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hierarchical organization). They also contain “surface” agents that act as signals or 
receptors for interaction with other encapsulants, or gates to move agents and other 
encapsulants into and out of the encapsulant. These surface agents, analogous to 
transmembrane proteins in biological cells, manage all external interactions of the 
encapsulant, and allow it to act as an “agent” building block for structures and execu- 
tion pathways at another (higher) hierarchy level. 



3 Computing with Protein Networks 

3.1 RAM Machine Computing with Proteins 

We wish to examine how the self-assembly processes of protein networks can be 
harnessed to perform computation. Instead of dealing with Turing machines directly, 
we will discuss RAM machines. [10] A RAM machine is more directly realizable 
using proteins. Turing and RAM machines are equivalent, i.e., any Turing machine 
can be assembled from a suitable set of RAM machines, and vice-versa. [10] RAM 
machine computing requires an ordered sequence of operations that are carried out on 
a small set of idealized integer registers (each of unlimited capacity). Any computa- 
tion can be programmed using only two types of operations: those that increment a 
particular register by 1 ([-t]reg); and those that either decrement a particular register 
by 1 (if the register is nonzero) or else jump to some other part of the program se- 
quence ([-]reg/jump). Thus, to construct a RAM machine from the protein-emulating 
agents described in Section 2, we need agents that represent registers, agents that 
perform the increment operation on each register, and agents that perform the decre- 
ment/jump operation on each register. 

A unary representation [10] for integers allows the size of any clone population of 
assembled molecules to serve as a register. The register molecules can be free- 
floating or can be assembled into polymers. We use the phosphorylated state (pA’) of 
a model protein (pA) as an individual count of a register (called A). To be more con- 
crete, if five of the pA proteins are phosphorylated, then the value of register A is 
five. A kinase that can phosphorylate protein pA can act as the increment agent [h-]A, 
if it can be activated and can signal as described below. Similarly, a phosphatase that 
can dephosphorylate pA’ can decrement register A if it is nonzero. Different registers 
are made from different types of proteins. 

Ordered sequences of [H-]reg and [-]reg/jump are dynamically self-assembled by 
switching on the appropriate agent at the appropriate step in the computation se- 
quence. The system produces an ordered sequence of computational operations by 
temporal activation, rather than through spatial wiring. To implement this, we con- 
sider protein complexes that must be triggered by another selective signaling protein 
to become active. Similarly, these protein complexes must release another signaling 
protein to activate the next protein agent. Allosteric proteins with unique binding site 
selectivity and switchable binding site dynamics are ideal for creating the unique 
sequences of protein activity needed for computation. Signal cascades can also be 
implemented, so that parallel execution pathways can be triggered. The timings of the 
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sequence depend on bonding rates that in turn depend on molecular arrival time sta- 
tistics. Thus, the computation execution times are stochastic. 

Decisions/hranchings are carried out hy the exit pathways of the [-]reg/jump 
agents. This means that these agents must be able to release two alternate signaling 
molecules — one if a dephosphorylation actually occurred, and a different jump signal 
molecule after a “waiting” time in which no register protein binds to this agent. The 
“jump” molecule clearly must be released with a rate that is, on average, slow com- 
pared to the arrival time of a register protein (when one is present). Also, the arrival 
of the register protein must prevent the jump molecule from being released. These 
properties are designed into the event handlers of the agent’s binding sites, and are of 
similar complexity to those of a conventional kinesin protein that “walks” along a 
microtubule in a eucaryotic cell. [11] The hydrolysis of ATP drives cyclic irreversible 
behavior. 

Fig. 1 illustrates the interactions of the [-jreg/jump agent with the signaling pro- 
teins, register proteins, and ATP. Agents are represented hy polygon shapes. The 
binding sites and key values are shown as tabs at the perimeter of the agent. When the 
sites of two agents are bound, they are shown as touching. The [-]reg/jump agent is 
labeled, as are the ATP agents. The two collections of agents to the right represent a 
single register. The phosphorylated version of the register protein is shown in a 
lighter gray. Initially, in panel (a), the value of the register is five, and the [-]reg/jump 
agent has a single “trigger” binding site exposed, with a key of 1. It also has four 
other sites that are hidden (they have an invalid key, 0). In panel (b), when a signal- 
ing agent with a complementary key of -1 binds with the trigger site of the 
[-]reg/jump agent, two additional sites are exposed, with key values of 2 and 3. When 
the trigger site unbinding event is handled, if both the ATP and register proteins are 
bound to these two sites (as in panel (b)), then in panel (c), the hydrolysis of ATP 
drives the [-]reg/jump agent to dephosphorylate the register protein (note that in panel 
(c), there are only four phosphorylated register proteins, and an additional unphos- 
phorylated version), release it and the “spent” ATP, and expose the “done” site with a 
key of 5. A signaling protein with a key of -5 binds to the done site. When released, 
it will trigger the next operation in the execution sequence. 

If there had been no register protein bound when the trigger site unbinding event 
was handled, then the “jump” site (lower right site of the [-jreg/jump agent in Fig. 1) 
would have been exposed with a key of 6, rather than the done site with a key of 5. 
As a result, a different signaling protein would become bound to the jump site, and a 
different execution path would follow. Certainly, if there are no phosphorylated 
versions of the register protein (i.e., the register value is zero), then the jump pathway 
will be taken. However, due to the stochastic nature of the “race” between the binding 
event of the register site and the unbinding event of the trigger site, the stochastic 
jump process will produce incorrect jumps (when the register is nonzero) with some 
probability that depends on the relative rates involved. 

The increment agent, [H-jreg, is similar to, but slightly simpler than, the decrement 
agent. The binding of the trigger site exposes the ATP- and register-protein-binding 
sites. The ATP key is the same, 2, but in this case the register-protein-binding site’s 
key is 4, to bind to the unphosphorylated version of the register protein. To incre- 
ment the register, it phosphorylates the register protein (i.e., changes its key to -3), 
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then exposes a done site with a key of 8. There is no “jump” associated with the 
increment operation. 




Fig. 1. Illustration of the decrement operation, (a) The [-]reg/jump and ATP agents are la- 
beled. The two collections of agents to the right represent a single register with a value of 5 
(phosphorylated proteins are lighter gray), (b) When the [-]reg/jump agent is triggered, it binds 
to an ATP and a phosphorylated register protein, fc) Then it dephosphorylates the register 
protein, thereby decrementing the register, releases the ATP and register protein, and signals 
success 



We have implemented simulated protein networks for elementary operations such 
as zeroing a register, register copying, adding contents of one register to another. 
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using a register to control the number of loops through a repeated sequence of agent 
operations, multiplying two register contents into a third register, and computing a 
modulus of a register value. Stochastic versions of any deterministic Turing machine 
can in principle be obtained using dynamic self-assembly of proteins that exhibit 
commonly available properties. 

To illustrate how this simple set of agents can accomplish such computations. Fig. 
2 shows a schematic diagram of the network of proteins required to multiply two 
registers, A and B, into a third register G. Only the increment and decrement agents 
are shown. Each of these agents in the actual simulation interacts with the register 
proteins and ATP, as shown in Fig. 1, but these are omitted from Fig. 2 for clearer 
viewing of the execution sequence itself. A solid arrow represents a pathway that a 
signaling protein makes from the done site of one agent (tail of the arrow) to the trig- 
ger site of the next agent (head of the arrow). A dashed arrow represents a signaling 
protein’s pathway from the jump site of one agent to the trigger site of the next agent. 




Fig. 2. Schematic diagram of protein network to multiply registers A and B into register G (see 
text for discussion) 

The order in which the signals are propagated is indicated by a number in paren- 
theses along the signaling pathway. We will describe the sequence using an example 
in which registers A and B are initially set to 2 and 3, respectively, and G and H are 
both 0. A start signal (1) triggers the decrement of register A, so that we now have A 
= 1. We then (2) enter a loop contained in a box in the figure. In this loop, B is 
decremented, (3) G is incremented, and (4) Ft is incremented. (It will become appar- 
ent shortly why we must increment H.) The loop is repeated (5), beginning with the 
decrement of B. After three passes through the loop, B = 0, and G = H = 3. This loop 
has the effect of adding the value of B into registers G and Ft. The next attempt to 
decrement B will find a zero-valued B register and therefore jump (6) to the next loop 
to restore B from Ft. In this loop, (7) and (8), H is decremented and B incremented 
until H = 0 and B = 3. When we attempt to decrement H again, it jumps (9) to 
decrementing A (A = 0), and then the entire outer loop, (2) - (9) is repeated, so that G 
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= 6 (= 2 * 3, the original values of A * B). On the next attempt to decrement A, it 
jumps (10) to whatever the next operation might be in a more extensive calculation. 

For this illustration, we have described the ideal, “correct” behavior of the net- 
work. However, any time a decrement occurs, it could jump even though the register 
is nonzero, due to the stochastic nature of this agent. So, in fact, there are numerous 
opportunities for errors in even this simple computation. 



3.2 Stochastic Computing, Errors, and Entropy 

We present results of stochastic simulations of encapsulants computing 
(A*B)h-(C*D)h-(E*F), where A, B, C, D, E, and E are initial register values. We 
simulate a small population of encapsulants with identical internal component popu- 
lations and examine the error rates and configurational entropy (Sconfig) of this system 
as a function of time. For this analysis, we consider two encapsulants to be in the 
same configuration if all of the [H-]reg and [-jreg/jump agents and signaling proteins 
are in the same binding state and all of the register populations have the same associ- 
ated integer value. Sconfig of these small populations can be zero when all encapsulants 
are in the same configuration, so that these encapsulants are far from equilibrium. The 
stochastic nature of the jump operations means that such a set of identically config- 
ured encapsulants with Sconfig=0 will not remain so, and Sconfig will tend to increase 
with time (but not monotonically, as we show below). The maximum Sconfig condition 
is for each encapsulant to be in a unique state. 

The simulation begins with a population of ten duplicate encapsulants, but with 
randomly selected initial register values. The first phase of the simulation is to copy 
all register values from a single “starter” encapsulant to the other nine encapsulants, 
so that they all begin the calculation with the same values in registers A through F. 
This process occurs with some “yield,” i.e., there is a nonzero probability that one or 
more register copy operations will produce an incorrect register value. When the 
copying is completed, a synchronizing encapsulant is used to trigger the calculation. 
The calculation process then proceeds to completion, also with some “yield” of cor- 
rect register values. The averaged yields of final results were obtained from 220 
simulations. Fig. 3 (left panel) shows the average yield for the computation as a func- 
tion of ATP concentration. These results make clear that the dynamic, non- 
equilibrium behavior of these encapsulated protein networks is driven by the free 
energy of the ATP population. If the system does not have sufficient energy (ATP), it 
cannot perform the computation correctly. Fig. 3 (right panel) shows a scatter plot of 
final normalized entropy (Sconfig divided by its maximum) as a function of errors in 
the final answers. These results show that ending in a more highly ordered state (low 
entropy) is clearly correlated with high yields of correct computational results (low 
errors), so that maintaining far-from-equilibrium configurations is the desired out- 
come for these protein networks. The entropy captures all configurational differences, 
including those that do not disrupt the final register values, and this produces the 
scatter in the plot. 
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Fig. 3. {left) The fraction of encapsulants with the correct final results as a function of ATP 
concentration, {right) Normalized final entropy vs. fraction of encapsulants with errors in their 
final results 




Fig. 4. Normalized entropy as a function of time for a single computation where all of the 
encapsulants are correctly copied, and all but one of the computations achieved the correct 
result 



The Sconfig as a function of time for a single computational run is shown in Fig. 4. 
We have chosen a case where all of the encapsulants are correctly copied, and all but 
one of the encapsulants achieved the correct result. Sconfig begins at a large value due 
to the initial randomized values of the registers in each encapsulant. The register- 
copying phase is completed at t ~ 5000, in a totally ordered configuration of encap- 
sulants (Sconfig = 0). The calculation is initiated at t ~ 22000, and while each encap- 
sulant is performing its calculation independent of the others, their configurations 
again diverge (Sconfig = !)■ Finally, all of the encapsulants reach a finished state, with 
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all but one encapsulant reaching the same final state (low, hut nonzero Sconfig)- Thus, 
this non-equilibrium process is cyclic in the Sconfig- 

The tendency of these stochastic computational processes to increase their Sconfig 
after a computational cycle is simply the slow equilibration of the configurational 
degrees of freedom. This clearly prevents arbitrarily long computations from being 
performed in the simple manner described above. The imperfect yield in the compu- 
tational processes described above has some similarities to the classic problem of 
communicating through a noisy channel. [12] Here we have a more general process 
of noisy computing processes (state transitions) in addition to noisy information 
transfer. Correct computing in general requires a mechanism for restoring Sconfig to 
zero periodically, with each restoration occurring before the distribution equilibrates 
too far. We are currently developing simulations of a hierarchical algorithm (i.e., in 
which the encapsulants act as agents) to restore low entropy in order to correct com- 
putational errors. 



4 Self- Assembling Software 

4.1 From Protein Networks to Self-Assembling Software 

Our goal is to abstract the relevant properties of the proteins and their networks to 
devise a novel self-assembling software technology. The work on protein networks 
has provided valuable intuition about the important protein properties, agent design, 
interactions, etc. that enable self-assembly of physical structures and execution se- 
quences. It has also provided insight as to what properties of real physical systems 
should be omitted for efficient software technology. 

The issues of stochastics, equilibration, and error-correction discussed in the previ- 
ous section are real issues for any molecular computing with protein networks that 
might be attempted experimentally. However, to develop self-assembling software, 
we conveniently side-step these issues by choosing not to model the non-equilibrium, 
dissipative aspects of the protein interactions, and by using deterministic binding and 
unbinding event times. In addition, although it is instructive to demonstrate that a 
RAM machine can be constructed and a computation carried out using protein ma- 
chinery, building software with the fundamental increment and decrement operations 
would be inefficient and wasteful. Instead, each agent in the self-assembling software 
is designed to do anything from an arithmetic operation like adding two numbers, to 
reading or writing to a file, to implementing an entire algorithm. 

4.2 Example: Bank Transaction 

We have described the essential properties of our fundamental building blocks 
(agents) and infrastructure (Section 2), and we have described how protein networks 
can self-assemble a computation using those agents and infrastructure (Section 3). 
We now present an example of self-assembling software, chosen for its simplicity, to 
demonstrate how the protein-emulating agents can self-assemble to handle savings 
account withdrawals. 

Initially (Fig. 5), three agents are present, the Balance, the Withdrawal Process, 
and the Primer. The Balance stores the current balance for the account ($100.00). 
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The Withdrawal Process has the task of subtracting the withdrawal amount from the 
current balance and updating the balance. The Primer acts as the “head” of a “poly- 
mer” of completed transactions, which can be walked later by a Monthly Account 
Report agent. Their binding sites are posted on the free-site list. 

When a Withdrawal occurs, binding events are scheduled between its free sites and 
the complementary sites of the Withdrawal Process. When the binding event handlers 
are executed, the Withdrawal Process exposes a site with a key of -102. This results 
in a binding event with the 102 site of the Balance (Fig. 6). When the Withdrawal 
Process is bound to both the Withdrawal and the Balance simultaneously, it subtracts 
the withdrawal amount (stored in the Withdrawal agent) from the current balance 
(stored in the Balance agent), and saves the result back to the Balance agent. The 
Withdrawal Process then changes the keys of the Withdrawal (Fig. 7), so that (1) it 
will not bind again to the Withdrawal Process (which would result in subtracting the 
same withdrawal again) and (2) it will bind to the Primer and leave a 103 site avail- 
able for the next Withdrawal to bind to. Lastly, the Withdrawal Process hides its own 
-102 site and resets to its original state. Now it is ready for another Withdrawal (Fig. 
7). Note that the Withdrawal Process exposes a site to bind to the Balance only tem- 
porarily. This leaves the Balance free to bind to other agents (such as a Deposit Proc- 
ess or Interest Compounder) when needed. 

This very simple example illustrates all of the properties of the agent described in 
the Introduction. The agents have selective binding sites. When binding occurs, they 
actuate and/or expose, hide, or change the keys of other sites. This results in the self- 
assembly of an execution sequence (the withdrawal of funds from a bank account) 
and of a data structure (in this case, a linked list of completed withdrawals). 

Finally, when the Savings Account software module is completed, it is encapsu- 
lated (recall that, as discussed in Section 2, an encapsulant is analogous to a cell’s 
plasma membrane, isolating its contents from the external environment). Other 
banking modules, such as Auto Loans and Credit Cards, are also encapsulated. Each 
encapsulant has a Gate agent embedded in its surface, which selectively allows agents 
to enter, based on matching keys. In the overall banking system, when a Withdrawal 
occurs, its key matches only the Gate of the Savings Account module, so it enters and 
undergoes the same process described above. Similarly, credit card payments enter 
and undergo processing in the Credit Card encapsulant, etc. In our computational 
experiments, we have implemented all of the behaviors described here. In addition 
we have implemented agent and encapsulant transport into and out of encapsulants 
executing concurrently with the above example. 

4.3 Novel Capabilities of Self-Assembling Software 

External Override. The fact that an execution sequence is self-assembled, rather 
than hard-coded, leads to innovative and powerful capabilities in software developed 
by this approach. One example is the “external override.” This self-assembling soft- 
ware construct overrides the behavior of the existing code, and it is imposed exter- 
nally. l.e., the original source code “inside” the executable is not modified; instead, 
additional agents are added from the outside to effect the override. Although there 
are many similarities between our agent-based approach and object-oriented methods, 
the external override provides functionality that is distinct from object-oriented in- 
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heritance, as it allows removal of unwanted functionality (corresponding to part of a 
method) from the “outside” of the existing (compiled) software structure during run- 
time. This additional flexibility may be useful for enhancing software reuse. 




Fig. 5. Initially, the Balance, Withdrawal Process, and Primer agents are available for binding. 
When a withdrawal occurs, the Withdrawal agent promptly binds to the matching sites of the 
Withdrawal Process 




Fig. 6. When the Withdrawal binds to the Withdrawal Process, the Withdrawal Process 
changes one of its keys from 0 to -102. The Balance then binds with the Withdrawal Process. 
The Withdrawal Process subtracts the withdrawal amount from the balance, and updates the 
balance 




Fig. 7. After completing the withdrawal transaction, the Withdrawal Process changes the keys 
of the Withdrawal, so that (1) it will not bind again to the Withdrawal Process and (2) it binds 
with the Primer, exposing a site with a key of 103 for binding later to other completed With- 
drawals. Then the Withdrawal Process is ready to handle a new Withdrawal 
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As an example, suppose that, after executing the withdrawal code described in the 
previous section, we “realize” that a requirement was omitted: the system shall pre- 
vent the withdrawal of an amount exceeding the current balance. To accommodate 
this requirement, we implement an external override (not shown in the figures). We 
add a Change Keys machine into the simulation, which binds to the -101 key of the 
Withdrawal Process and modifies it to -106. In addition, a Verify Balance agent is 
added, with keys of -101, -104, and -105. Now, when a Withdrawal (which has a 
101 key) occurs, it binds with the Verify Balance agent instead of the Withdrawal 
Process. The Verify Balance agent compares the withdrawal amount to the account 
balance. If there are sufficient funds for the withdrawal, the Verify Balance agent 
changes the 101 key of the Withdrawal to 106 and releases it, enabling binding with 
the Withdrawal Process, and the transaction proceeds as before (Figs. 5-7). If there 
are insufficient funds, the Verify Balance agent changes the keys of the Withdrawal 
to some other values, resulting in binding with an Insufficient Funds agent instead. 

Note that with our dynamic self-assembly approach, this new function was inserted 
into the existing program without (a) rewriting the original source code, (b) compiling 
an entire new program, or (c) shutting down the already running software. 

Internal “Re-wiring” and Optimization. The external override just described illus- 
trates the inherently dynamic nature of the self-assembling executable code. The 
structure of the executing code is assembling and disassembling all the while it is 
executing, with execution pathways that are driven dynamically by matching keys 
between agents. All that is required to change the execution pathway — “re- wire” 
what the code does, or turn code on or off — is a change of keys. 

Not only can the executable be re-wired from the outside with an external override, 
it can also re-wire itself from the inside, both what it does and when it does it. The 
code itself could be designed to detect its own properties, such as memory usage, 
speed, etc., and modify its own code and/or data structures in order to optimize in a 
particular way, such as using more memory in order to speed up a large calculation. 
Similarly, runtime priority can be modified for multiple concurrent self-assembly 
processes. Processor allocation is often implemented at the operating system level. It 
is easy to allocate different amounts of processing time to concurrent processes here 
by varying the future (virtual) event times associated with each process. Those with 
short times will repeatedly activate more frequently. 

Situation Detection. Another aspect of the dynamic nature of the executable code is 
the fact that the execution sequences are self-assembled whenever binding sites “find” 
each other. An agent can wait passively with its available sites on the free-site list 
until complementary sites are available. They could become available immediately, 
or they might not become available until a million events have been handled. We 
harness this “uncertainty” to implement a self-assembling software construct called a 
“situation.” Situations provide a mechanism for “sensing” whenever certain condi- 
tions or events occur by providing passive agents with empty binding sites. These 
binding sites correspond to the conditions of interest, and when all sites are bound, 
the sensing agent is activated to report or trigger a desired response. Situation detec- 
tion is asynchronous. It is also passive, in that no repeated active polling by the agent 
itself is required to detect the events. It simply waits with its sites on the free-site list. 
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Situations can be added at runtime to compiled code to monitor the code structure 
itself. For example, the activity of other agents, their status (number of bound and 
unbound sites, active or dormant), their functionality, and the numbers and types of 
agents present in an encapsulant can all be determined automatically. Our agent 
binding events have some similarities to asynchronous message passing between 
concurrent servers, e.g., as in the JOULE language. In contrast to message passing, 
our events provide bi-directional communication in which both agents “know” that 
they both have been triggered, and both execute code based on this knowledge. 

“Monitoring” and “querying” are special cases of the override and situation proc- 
esses, and are used to inspect the code or the status of its agents. They are like the 
external override in that they are implemented by inserting agents into the execution 
pathway during runtime. They are like the situation in that they can sense sought- 
after conditions of the running code and report on activity or on the data that are be- 
ing manipulated. The functionality of the agents being monitored is not affected dur- 
ing monitoring. Monitoring and querying only differ in their usage. Monitoring is 
used to “keep an eye on” some aspect of the code. For example, a goal such as “re- 
port every time this credit card is used in two different cities on the same day” would 
be implemented as a monitor. Once the monitoring agent has detected the situation 
and reported it, it passively waits for the situation to arise again. In contrast, a query 
is used to determine something immediate, and then self-destructs and is removed 
from the simulation. For example, if a program is using up a tremendous amount of 
CPU without producing any output, a query can be constructed to determine what the 
program (which was perhaps written by another programmer) is doing right now. 



5 Summary 

In this paper, we have shown two ways in which dynamic self-assembly can be used 
to perform computation, via stochastic protein networks and self-assembling soft- 
ware. We described our protein-emulating agent-based simulation infrastructure, 
which is used for both types of computations. The agents have a few properties suffi- 
cient for dynamic self-assembly: they have selective binding sites, and when binding 
occurs, they actuate and/or expose, hide, or change the keys of other sites. Examples 
of protein-network-based computation and self-assembling software were presented. 
We described some novel programming constructs that are enabled by the inherently 
dynamic nature of the self-assembling executable code: the “situation”, the “external 
override” for software reuse, and the ability to monitor or query preexisting code as it 
executes. These novel capabilities demonstrate that the self-assembling software 
approach is dynamic and adaptable. Current and future work will focus on the issues 
of robustness, evolutionary (autonomous) adaptability, and self-healing, as well as 
code generation from user-specified goals. 

We thank Gerry Hays, Wil Gauster, and Julie Phillips for their support of this re- 
search effort. Sandia is a multiprogram laboratory operated by Sandia Corporation, a 
Lockheed Martin Company, for the United States Department of Energy’s National 
Nuclear Security Administration under Contract DE-AC04-94AL85000. 
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Abstract. The proposed system is designed to support the dynamic formation 
of human relationships between strangers in networked virtual space. This sys- 
tem attempts to locate the most suitable virtual space for the content of their 
conversation. The system analyzes the conversation by using morphological 
analysis and provides candidate virtual spaces related to the conversation topic. 
Users can change the conversational space freely by choosing one of the candi- 
dates. The new virtual space, tailored to the participants’ conversation, can po- 
tentially enhance the quality and satisfaction the participants perceive during 
their virtual relationship. In this paper, we describe the implementation and ex- 
perimental evaluation of our proposed system. Our results show that by placing 
users in virtual settings that are “aware” of their conversation topics, we can 
created a compelling and powerful communication paradigm, potentially en- 
hancing the quality of the resulting human relationship. 



1 Introduction 

With the rapid and widespread diffusion of the Internet, people can live symbiotically 
with many strangers who live anywhere on earth through this network. Information 
networks are now an integral part of everyday life of many people world-wide. The 
symbiosis between humans and information networks, and the many challenges this 
symbiosis carries, have to be addressed. In such communications, they usually use a 
range of communication tools such as email, chat, and elaborate 3D virtual spaces. 
Generally, with email or chat, they communicate with each other by using only text. 
In contrast, with a 3D virtual space, they can communicate by using various multi- 
media contents, such as text, audio, images, and movies. These media promise to 
enrich on-line conversation. In addition, a 3D virtual space can provide a high-quality 
environment that is very similar to the real world. Over the past decade, several stud- 
ies have focused on developing 3D virtual spaces [1-10]. 

Communication in a virtual space has a major advantage: it allows users to express 
their emotions by using avatars that behave as themselves. Moreover, as described 
above, a virtual space offers the possibility of realizing effective and efficient com- 
munication beyond the real world by using various rich multimedia contents. 
Moreover, since users can move or travel freely inside the virtual space, where many 
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people from all over the world gather at the same time, they have opportunities to 
make a lot of friends from anywhere on earth. However, in communication in virtual 
space people sometimes meet others by chance, so there is the problem that such an 
environment lacks knowledge about potential partners, such as a partner’s cultural 
background, age, gender, and so on. Furthermore, it is also difficult to perceive the 
partner’s character in a virtual meeting place, unlike situations in the real world. 
Therefore, in the early stage of conversation, since the topics for communication 
becomes scarce and the conversation cannot become lively, human relationships 
between users do not develop smoothly. 

In order to solve this problem, we propose a novel system that animates the con- 
versation of those who meet by chance in a virtual space and that supports the dy- 
namic formation of human relationships. Our proposed system analyzes users’ con- 
versations in real time and moves them to the virtual space relevant to the topics of 
their conversation dynamically. Users can share the images of their conversation, 
recall their past episodes [11], and generate new topics spontaneously by using the 
information obtained from the new space. By this technique, we believe that our sys- 
tem can support communication between strangers in virtual space and can also pro- 
mote the formation of human relationships. 



2 Communication in Networked Virtual Space 

When people communicate with others through the Internet, it is common to use tools 
that transmit their intentions only by text, such as chat and email. However, due to the 
rapid development of network and virtual reality technologies, it has become possible 
to communicate in a networked virtual space as if users were in the real world. In 
such a networked virtual space, two or more users can share the space and their exis- 
tence, allowing communication among strangers who meet by chance. Moreover, as 
in the real world, they can also share information and experiences obtained in the 
virtual space. Furthermore, the networked virtual space can remove physical restric- 
tions in the real world such as nationality, culture, and distance. 

Thus, we assume that communication in a networked virtual space would be richer 
than conventional communication using only text. However, the conversational part- 
ners whom people meet in a networked virtual space are usually strangers. As a re- 
sult, they have to start communicating under the condition of having no knowledge 
about their partner. Furthermore, when people start their conversation in the real 
world, they sometimes use as topics the information obtained from the place or space 
where they met. This supports people in understanding their partner’s character and 
forming a human relationship dynamically. 

In order to solve these problems, several studies over the past few years have fo- 
cused on supporting communication in a networked virtual space. Helper Agent [8] 
provides a topic to participants in order to support building their common ground 
when the topics for conversation become scarce and the conversation has stagnated. 
However, since the agent intervenes between participants’ conversation as a third 
person, this system can make participants feel that they are not conducting the con- 
versation on their own. Comic Chat [9] tried to express the history and situation of 
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participants’ conversation intuitively by using comics. This system analyzes partici- 
pants’ conversation and expresses their emotion and who is talking hy comics. 
Appropriate comics are generated only if appropriate keywords are found in their 
conversation, hut such keywords are not found in many cases, and thus the system 
cannot always generate appropriate comics. Talking in Circles [10] is a system that 
supports naturalistic group interaction. Participants communicate primarily by speech 
and are represented as colored circles in a two-dimensional space. The circles serve as 
platforms for the display of their identity, presence and activity. Although this system 
allows expressing intuitively the situation of participants existing in the virtual space, 
it does not support either animated conversations or human relationship formation 
between strangers. 

In contrast, our proposed system supports the animation of conversations and the 
formation of human relationships dynamically hy using the place or space itself, 
which is the greatest feature of the networked virtual space. This system analyzes 
participants’ conversation in real time and then calculates the relevancy ratios of the 
spaces hy comparing topics of their conversation with keywords that describe the 
virtual space stored in the database. Then, the system shows the users some spaces 
related to the topics as candidates. Participants can change the conversational space 
into a new space dynamically by choosing one of the candidates. In this way, they can 
share the images of their conversation and generate new topics spontaneously hy 
using the information obtained from the new conversational space. 



3 Composition of Proposed System 

Our proposed system consists of a server and two clients as shown in Fig. 1. The 
server manages the Spatial Database, the Space Retrieval System, the Virtual Space 
Manager, and the Chat Manager, which manages participants’ conversation. The 
Spatial Database stores a set of 3D virtual spaces where participants communicate and 
a set of keywords that describe the space for the retrieval. The screen of each client 
consists of three windows; the first window (upper-left part of client screen in Fig. 1) 
shows the virtual space where participants are, the second one (bottom-left part of 
client screen in Fig. 1) indicates some candidates of virtual space that are well 
matched to the topics of the participants’ conversation, and the third one (right part of 
client screen in Fig. 1) displays the participants’ conversation as a chat log. 

The space retrieval process is performed every time a participant writes and sends a 
sentence to the other. The written sentence is transmitted to the server through the 
network and passed to the Space Retrieval System. This system analyzes the sentence 
by using morphological analysis and then calculates the relevancy ratios by 
comparison with keywords that describe the virtual space. The list of candidate virtual 
spaces is transmitted to both of the clients, and the clients display preview images of 
these candidates. Finally, participants can change the conversational space into a new 
space dynamically by choosing one of the candidates. 
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Fig. 1. System overview. 



3.1 Composition of Spatial Database 

The Spatial Database consists of spatial records. Each spatial record is constructed of 
four layers of data: the virtual space, a set of feature position coordinates, the key- 
words describing each feature position (the position keywords), and the virtual space 
keywords describing the space itself (upper-left part of Fig. 1). In the virtual space, 
participants can walk, run, swim, and jump. Since each virtual space is large and 
consists of several places whose atmospheres are different from others, we determine 
these places as the feature positions. The position keywords and the virtual space 
keywords express characteristics of their positions and virtual spaces, and they are 
defined in advance and stored in the database. These keywords are used to retrieve 
the virtual space that is relevant to the participants’ conversation. The retrieval system 
performs between the keywords extracted from their conversation and the position 
and virtual space keywords by calculating the relevancy ratios. The system displays 
preview images of the candidates of spaces whose relevancy ratios are high. We build 
the Spatial Database by the following procedure. 
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We select several feature positions manually in each virtual space and store them in 
the Spatial Database. In order to register the position and virtual space keywords that 
express characteristics of feature positions and the virtual space, database registration 
workers enter a virtual space and move to the place corresponding to each feature 
position. Then, the workers register the position and virtual space keywords accord- 
ing to the following procedure. 

1. Workers investigate the atmosphere of their surroundings by looking around and 
moving for a while in the feature position. 

2. They input a sentence that they feel and imagine at the position. 

The workers repeat this process based on the number of feature positions. The set 
of sentences obtained by this process is analyzed by the following process. 

1. A sentence obtained by the database registration workers is analyzed by the 
morphological analysis module. Then, noun, adjective, and unknown words are ex- 
tracted. 

2. From the extracted words, words that are inappropriate as the keyword (such as a 
face character) are deleted. 

3. The number of repetitions of extracted words is calculated, and then the words 
are sorted in descending order of frequency. 

4. Only 10 words are stored in the Spatial Database as position keywords based on 
frequency established in the previous step. 

5. Steps 1-4 are repeated for all feature positions. 

Finally, the virtual space keyword is extracted from the position keywords. Only 10 
position keywords are adopted as the virtual space keyword based on word frequency. 
Thus, the Spatial Database is built by carrying out the above processing. 



3.2 Details of Retrieval System 

The Retrieval System computes the relevancy ratios by comparing the position and 
virtual space keywords with the words extracted from the participants’ conversation 
and then integrating the results of both relevancy ratios. This calculation of both rele- 
vancy ratios is made because it is not sufficient to calculate either the relevancy ratio 
of the position keywords or that of the virtual space keywords alone. This is because 
the virtual space keywords include global information, e.g. a city or a country, while 
the position keywords includes local information, e.g. a classroom or a parking lot. 
These two relevancy ratios should be synthesized in order to determine the final rele- 
vancy ratio for the virtual space. Accordingly, the system is able to retrieve a space 
that combines global and local information, such as a country parking lot and an 
urban parking lot. However, since each virtual space is large and consists of several 
places whose atmospheres are different from others, the concept of the virtual space is 
typically ambiguous as opposed to a country or a city. Therefore, it is difficult to 
retrieve these virtual spaces exactly. In order to retrieve the virtual space, it is neces- 
sary to form the concept of the virtual space exactly. We use the Associatron, which 
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is a kind of neural network that is specialized to form a concept [12]. The Associatron 
memorizes the sum of the auto-correlation matrices of the vectors. If a certain vector 
is input into the Associatron, it recalls a vector pattern that is similar to the input 
vector. The concept of the virtual space is formed, and the virtual space is retrieved 
exactly by using this technique. 




Fig. 2. Retrieval flow. 



3.3 Flow of Retrieval Processing 

Figure 2 shows the flow of the retrieval process. A participant’s utterance serves as a 
trigger to start the retrieval. It is passed to the morphological analysis module and 
separated into noun, adjective, and unknown words. In these extracted words, the 
system deletes words that are inappropriate as keywords. The Data Base Manager 
(DBM) registers the passed keyword into the appearance keyword history list. Be- 
cause we want to retrieve a virtual space that is similar to the most recent participants’ 
conversation, only keywords extracted from the newest five utterances are registered 
in the appearance keyword history list. Moreover, in order to retrieve efficiently, 
these keywords should match the keywords registered in the Spatial Database. After 
the registration is completed, the DBM sends a signal to the position and virtual space 
retrieval modules to start retrieval. Each of the retrieval modules calculates the rele- 
vancy ratios between the words in conversation and the position keywords/virtual 
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space keywords. The relevancy ratio computation module adds the relevancy ratio of 
the virtual space to the relevancy ratio of the feature position according to a weight. 
Consequently, the final relevancy ratio is computed and the characteristics of the 
feature positions are combined with those of the virtual space. The relevancy ratio list 
is ordered according to high relevancy ratios. The System refers to the relevancy ratio 
list and provides preview images of candidate virtual spaces related to the contents of 
the participants’ conversation. 



3.3.1 Database Manager (DBM) 

As a pre-processing of the retrieval, in order to make retrieval easy and effective for 
each retrieval module, the DBM generates a retrieval index from the position and 
virtual space keywords and a retrieval query from the appearance keyword history 
list. 

First, in order to generate a retrieval index, the DBM calculates the total number of 
position and virtual space keywords u registered in the Spatial Database and then 
acquires the set of all keywords IF = (wi,W2, W3,...,w„) , where a keyword that appears 
several times is adopted only once. Next, the position keywords vector Pi is generated 
from W, and the position keyword Ai (/ = 1 ~ m) of each feature position is calculated 
by using formulas (1) and (2). The virtual space keywords vector Sj is generated from 
W, and the virtual space keyword Bj (j=l ~ n) of each virtual space is calculated by 
using formulas (3) and (4). Here, the variable m means the number of feature posi- 
tions, and the variable n means the number of virtual spaces. 





( 1 ) 




[ 1 

[ 0 


( 2 ) 




( 3 ) 


4=1 


1 (w^eBj) 

0 (Wk^Bj) 


( 4 ) 



Accordingly, the position and virtual space keywords are converted into the re- 
trieval index. Then, in order to form the concept of the virtual space for the Asso- 
ciatron, associative memory matrix M is generated from the virtual space keywords 
vector Sj by using formula (5). 



M = 



n 






( 5 ) 



Moreover, in order to generate the retrieval query from the appearance keyword 
history list, the conversation keywords vector C is generated from W, and the appear- 
ance keyword history list H is calculated by using formulas (6) and (7). The conver- 
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sation keyword vector C is a set of keywords that appear both in the Spatial Database 
and the participants’ conversation. This permits the contents of conversation to be 
associated with the Spatial Database. 

c = (6) 



j 1 

[o 



(7) 



Each retrieval module uses four data obtained by the above processes, i.e. the posi- 
tion keywords vector Pi, the virtual space keywords vector Sj, the associative memory 
matrix M, and the conversation keywords vector C. 



3.3.2 Position Retrieval Module 

The position retrieval module computes the relevancy ratios of the feature positions 
by comparing the keywords vector C and the position keywords vector Pi generated 
by the DBM. First, this module calculates the number of keywords {PAi) by comput- 
ing the inner product of the conversation keywords vector C and the position key- 
words vector Pi. Then, this result is divided by 10, which is the number of position 
keywords of each feature position. This gives the relevancy ratios of the feature posi- 
tions PRi by formula (8). 

PA 

Pf?,.=^xl00(%) (8) 



3.3.3 Virtual Space Retrieval Module 

The virtual space retrieval is performed by using the Associatron. First, in order to 
recall the virtual space that is similar to the participants’ conversation, the associative 
memory matrix M and the conversation keywords vector C, which are generated by 
the DBM, are substituted for formula (9). The recalled output is defined as the asso- 
ciative keywords vector R. 

R = (^g(<t>g(M)C) (9) 



Here, 0 is a quantization function that transforms the matrix into a Boolean value, 
as shown in formula (10). 



0(X) = 



1 [x>0) 

0 {O<x<0) 



( 10 ) 



The threshold value ^in formula (10) is determined for every retrieval so that the 
number of adopted keywords of the associative keywords vector R may become less 
than the variable r. The variable r is a constant to make the retrieval stable. In calcu- 
lating the relevancy ratio of a virtual space SRj, the number of keywords (SAj) is 
computed by calculating the inner product of the associative keywords vector R and 
the virtual space keywords vector Sj. This result is divided by 10, which is the num- 
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her of virtual space keywords. This gives relevancy ratios of each virtual space SRj 
(formula (11)). 



SAi 

SRi = — ^xl00(%) 
^10 



( 11 ) 



3.3.4 Relevancy Ratio Computation Module 

In the relevancy ratio computation module, the final relevancy ratio Fi, which associ- 
ates the feature positions with the virtual space, is acquired by adding the relevancy 
ratio of the virtual space to that of the feature position with weights (formula (12)). 

F = (aPRi+b-SR,)xl 00(%) 

( 12 ) 

(1 < i < m;l < j <n) 

All relevancy ratios of all virtual spaces are calculated by repeating the above 
processing m times. Finally, this result is output as the relevancy ratio list in the order 
of high relevancy ratio. 



3.4 Display Candidate Virtual Spaces 

The system refers to the relevancy ratio list acquired as explained above and displays 
outlines with 2D images of the candidate virtual spaces in order of high relevancy 
ratio. Moreover, the relevancy ratio list is updated every time a participant speaks, 
and the candidate virtual spaces provided to participants are also updated each time. 
Participants can change the current conversational virtual space to a new virtual space 
by dynamically clicking and choosing one of the preview images of candidate virtual 
spaces. 



4 Implementation of Proposed System 

For the Spatial Database, we prepare 100 virtual spaces with different characteristics 
and select 3-5 feature positions in each virtual space manually (in total, 400 posi- 
tions). About 1500 positions and virtual space keywords are registered for these 400 
positions. One female and six male students in their twenties work to register these 
positions and virtual space keywords. 

Figure 3 shows an example user’s screen of the implemented system. As an im- 
plementation environment, we use three desktop PCs (OS: Windows 2000, CPU: Intel 
Pentium 3 800 MHz, RAM: 512 MB) as the server and clients, Borland C-H- Builder 

5 (a compiler), Chasen (a morphological analysis library) [13], and the Cyber-front 
Company’s Half-Life (a 3D engine). The upper-left window in Fig. 3 displays the 
present conversational virtual space (virtual space window). This is displayed in the 
participant’s first person point of view. Participants can look at their partner’s avatar 
in the virtual space window. A participant’s utterance is displayed in the upper-left 
part of the window. The right window presents the participants’ conversational log 
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(chat log window). Preview images of the candidate virtual spaces are displayed in 
the lower-left window (candidates window) by using 2D thumbnail images, and these 
preview images are displayed in order of high relevancy ratio. Due to the window 
size, we determined the number of preview images to be eight. Participants can 
change the conversational virtual space by clicking and choosing one of the preview 
images of the virtual spaces. The virtual space can be altered following input from a 
single user, and even without consensus between participants. Participants are trans- 
formed into a new virtual space, and are placed at nearly the same relative positions 
as in the original virtual space. 







RYO ; I recalled last year' 



YOSHI : Hi 
RYO : Hi 

YOSHI ; It is w^nn today 
RYO ; Yeah!! 

YOSHI : What do you do? 

YOSHI : I am a university student 
RYO : Great!! me too 
RYO ; Is there a schedule of the summer 
vacation? 

YOSHI : I am going to go to the resort 
RYO ; Where will you go? 

YOSHI : I haven't decided a destination yet 
RYO ; Why don't you go to Hawaii 
YOSHI : Sounds good!! 

RYO ; I went to Hawaii last year 
RYO : It was a very beautiful place 
YOSHI : Can I change the viitual space 
RYO : Sum 

YOSHI : Is Hawaii such feeling? 

RYO : Yeah! ! It looks like a h=tel which I 
stayed at 

RYO : I recalled last year 



Fig. 3. Screen of the proposed system. 



5 Evaluation of Proposed System 

We conducted an experiment to evaluate our proposed system. Our main goal was to 
determine whether it could be helpful for the dynamic formation of human relation- 
ships by animating the conversation between strangers under difficult situations for 
communication, such as meeting in a virtual space by chance. 



5.1 Experimental Conditions 

In order to place participants in a difficult situation for communication, we selected as 
subjects six male students in their twenties who were unacquainted with each other. 

As the experimental environment, we prepared two rooms so that participants 
would not meet face to face. We connected all hosts (one server and two clients) by 
using a LAN (100 Mbps). 
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We prepared three experimental conditions: conversation with text chat, a 3D chat, 
and our proposed system. With the text chat, participants talked through text only, 
without any information about their partner. The 3D chat is onr proposed system 
handicapped by removing the function to change the conversational virtual space. A 
participant talked with a different partner for 20 minutes in each of the three condi- 
tions. Moreover, the three conditions were changed so that they were all used in each 
participant’s group. Each participant performs the tasks in different order, to balance 
the effect of tasks order on the experiment results. 

Before starting the experiment, we explained to participants how to use the system, 
and then they played and practiced with it for 20 minutes. We asked them to talk 
about anything to get to know each other. Every time they finished talking in each 
condition, we asked them to answer a questionnaire. After they completed all tasks, 
we interviewed them. The total experimental time for one participant was about 2 
hours, including the system training, tasks, questionnaires, and interview. 

The questionnaire consists of 33 items about the conversation, themselves, and 
their conversational partner. They were asked to answer all items by rating on a 7- 
point scale (-3 ~ 3), where 3 is the highest score. In the interview, we asked them 
questions about the system and their conversation. 



5.2 Comparison of Text Chat and Proposed System 

We obtained significant differences in 13 items on the questionnaires from the resnlts 
of t-test comparisons between the text chat and the proposed system (Table 1). Erom 
the left of Table 1, the items are questionnaire items, average rating of the text chat, 
average rating of our proposed system, and t-value. The text chat showed snperiority 
in none of the questionnaire items. These results are summarized as follows. 

• They could communicate with each other more easily ((a) of Table 1). 

With our proposed system, they reported that making topics for conversation be- 
came easy, the contents of conversation were imagined very well, getting clues for 
conversation was easy, they felt relaxed, and they talked comfortably. 

From this result, we assume that our proposed system helps the progress of the 
conversation and reduces participants’ mental bnrden in comparison with the text 
chat. 

• Conversation was animated ((b) of Table 1). 

They felt that with the proposed system the conversation became more cheerful, 
they could enjoy their conversation, and topics for conversation were abnndant. 

Judging from these results, we can say that our proposed system animated their 
conversation. 

• They obtained a good impression of their partner ((c) of Table 1). 

With our proposed system, they wanted to continue the conversation with the part- 
ner, wanted to know more abont the partner, felt familiarity with the partner, obtained 
a good impression of the partner, and developed a sense of solidarity with the partner. 

Judging from the above results, we can conclude that our proposed system had 
positive effects on the participants’ interest in the conversation and their impressions 
of their partners. 
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Table 1. Results of t-test comparisons of text chat and proposed system. 



Questionnaire item 


Text chat 


Proposed 


t-value 


Making topic was easy (a) 


-0.5 


1.83 


2.7* 


Contents of story could be imagined well (a) 


0.67 


2.67 


3.72** 


Getting clue of conversation was easy (a) 


0.5 


2.17 


1.93* 


It was easy mentally (a) 


0.83 


2.5 


2.13* 


Talked comfortably (a) 


1 


2.33 


2^ 


Conversation was cheerful (b) 


0.67 


2 


2^ 


Enjoyed conversation (b) 


1.17 


2.33 


2* 


Topics were abundant (b) 


0.5 


1.83 


2.08* 


Wanted to continue talking with partner (c) 


0.83 


2.17 


2.64* 


Wanted to know more about partner (c) 


0.83 


2.33 


2.04* 


Felt familiarity with partner (c) 


0.5 


1.83 


2.08* 


Partner’s impression was good (c) 


1.17 


2.5 


2.08* 


Felt sense of solidarity with partner (c) 


0.5 


2 


1.96* 



*p<0.05 , **p<0.01 



5.3 Comparison of 3D Chat and Proposed System 

We acquired significant differences in 8 items of the questionnaires as a result of t- 
test comparisons between the 3D chat and our proposed system (Table 2). From the 
left of Table 2, the items are questionnaire items, average rating of the 3D chat, aver- 
age rating of our proposed system, and t-values. The 3D chat showed superiority in 
none of questionnaire items. The results are summarized below. 

• They could communicate with each other more easily ((a) of Table 2). 

With our proposed system, participants reported that making topics for conversa- 
tion was easy, contents of conversation were imagined very well, getting the clue of 
conversation was easy, and they felt relaxed. 

From these results, we assume that our proposed system helps the progress of the 
conversation and reduces participants’ mental burden in comparison with the 3D chat. 

• Conversation was animated ((b) of Table 2). 

They judged that with our proposed system sense of making topics became good, 
conversations became cheerful, and topics were abundant. 

Thus, we can say that our proposed system had positive effects on their conversa- 
tion and animated their conversation. 



5.4 Evaluation to Examine Differences in Participants 

We also examined the experimental results to focus on each participant. Each partici- 
pant was rated by their partner in the questionnaire. Table 3 shows the average ratings 
of the questionnaire. All questionnaire items in Table 3 have a significant difference. 
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The leftmost row of Table 3 shows questionnaire items, and the other rows are the 
average ratings of each participant (A-F) by his partners. Basically, these ratings 
show the impressions each participant had of his partners. There were clear differ- 
ences among the participants’ ratings in Table 3. Ratings of participant E were lower 
than those of any other participants for 13 of 15 items, so he was the least character- 
istic participant. 



Table 2. Results of t-test comparisons of 3D chat and proposed system. 



Questionnaire item 


3D chat 


Proposed 


t-value 


Making topic was easy (a) 


0.67 


1.83 


1.83* 


Contents of story could he imagined well fa) 


1.5 


2.67 


2.44* 


Conversation progressed smoothly (a) 


0.17 


1.5 


2.08* 


Getting clue of conversation was easy (a) 


0.67 


2.17 


2.04* 


It was easy mentally (a) 


1.33 


2.5 


2.15* 


I had a sense of making topics (h) 


-1.33 


0.5 


1.94* 


Conversation was cheerful (h) 


0.67 


2 


1.87* 


Topics were abundant (b) 


0.33 


1.83 


2.18* 



*p<0.05 



Table 3. Impression of each participant rated hy his partners 



Questionnaire item 


A 


B 


c 


D 


E 


F 


Making topic was easy 


1.33 


1 


0 


1 


0.33 


0.33 


Contents of story could be imagined 


1.33 


2 


1.67 


1.67 


1.33 


1.67 


Conversation progressed smoothly 


0.67 


0.67 


1.33 


1.67 


0.33 


0.33 


Getting clue of conversation was easy 


1.33 


0 


1.67 


2 


0 


1.67 


It was easy mentally 


1.67 


1.33 


2 


2 


0.67 


1.67 


1 had a sense of making topics 


-1.33 


-0.33 


-0.33 


0 


0 


-1 


Talked comfortably 


2 


1.67 


2.33 


1.67 


1 


2 


Conversation was cheerful 


1 


0 


2 


1.33 


0.33 


2 


Enjoyed conversation 


2 


2 


2 


1.67 


1.67 


2.33 


Topics were abundant 


0.67 


0.67 


1 


1 


0.67 


1.33 


Wanted to continue talking with partner 


1.33 


1.67 


2 


2 


1.33 


2 


Wanted to know more about partner 


1.67 


1.33 


2.33 


2.33 


1 


2 


Felt familiarity with partner 


1.33 


1.33 


1.33 


1.33 


0.33 


2.33 


Partner’s impression was good 


2 


1.67 


2.33 


2 


1.33 


1.67 


Felt sense of solidarity with partner 


1.33 


1.33 


2 


1 


0.67 


1.33 
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Therefore, we focused on participant E. The rating hy each partner of participant E 
is shown in Table 4. The items in Table 4 from the left are questionnaire items, rating 
in the text chat (partner: B), rating in the 3D chat (partner: A), and rating in the pro- 
posed system (partner: C). The partner’s ratings of participant E are highest in the 
order: proposed system, 3D chat, text chat. Moreover, 10 ratings in the proposed 
system are higher than in any other system. Judging from the above, we assume that 
participant E made a better impression on his partner in the proposed system. 



Table 4. Ratings of participant E by his partners. 



Questionnaire item 


B 


A 


C 


Making topic was easy 


-2 


1 


2 


Contents of story could be imagined well 


-1 


2 


3 


Conversation progressed smoothly 


-1 


1 


1 


Getting clue of conversation was easy 


-2 


1 


1 


It was easy mentally 


-1 


0 


3 


1 had a sense of making topics 


-1 


-1 


2 


Talked comfortably 


-2 


2 


3 


Conversation was cheerful 


-1 


1 


1 


Enjoyed conversation 


-1 


3 


3 


Topics were abundant 


-1 


1 


2 


Wanted to continue talking with partner 


-1 


3 


2 


Wanted to know more about partner 


-2 


2 


3 


Eelt familiarity with partner 


-2 


1 


2 


Partner’s impression was good 


-1 


2 


3 


Felt sense of solidarity with partner 


-2 


1 


3 



5.5 Interview Results 

The results of the interviews with participants are shown in the following. We 

obtained both positive and negative opinions of the proposed system. 

Positive opinions 

• Changing the virtual space itself became a topic of our conversation. 

• Making topics was easy, since a topic could be obtained from the virtual space. 

• The progress of our conversation was smooth, since we could talk in a virtual 
space that was relevant to the contents of our conversation. 

Negative opinions 

• We could not talk comfortably; topics were often changed because the virtual 
space had changed many times. 

• It took much time to use the proposed system efficiently because its operation 
was relatively complicated. 

• We could not concentrate on the conversation when the virtual space was 
changed because we were absorbed in exploring the new space. 
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The positive opinions were reflected in many answers from the questionnaire. 
Therefore, we assume that the proposed system is an effective system. However, from 
the negative opinions, we also found that the proposed system has characteristics that 
might disturb human conversation. 



6 Discussion 

The result of the questionnaire showed that our proposed system helped the progress 
and activation of the participants’ conversation more than either the text chat or the 
3D chat. From this, we assume that the participants could freely change the virtual 
space to make it relevant to their conversations, and thus the topics of the conversa- 
tions became abundant. We also obtained positive opinions from the interview de- 
scribed in the previous section. Moreover, the results of the questionnaire showed that 
the conversation using the proposed system gives better impressions of their partner 
than does the text chat. Thus, we concluded that conversation became easier and more 
active by using our system. From the results of the questionnaire, we could find no 
difference in the impression of the conversation partner between the 3D chat and the 
proposed system. However, the proposed system was more highly evaluated than the 
text chat and the 3D chat in the results of the experiment and the interview. There- 
fore, we believe that our proposed system can achieve even higher impressions of 
conversational partners by solving the problems raised in the interviews and devel- 
oping a new approach to supporting communication. 

Furthermore, we found a particular participant who made uncharacteristically low- 
rated impressions on his partners. These impressions improved gradually from the 
text chat to the 3D chat and finally to the proposed system. On the other hand, we 
found that our proposed system elicited negative feedback related to interference with 
communication, caused by such problems as complicated operations and frequent 
changes in conversational topics due to changes in the virtual spaces. 

We believe that more effective communication support can be realized by solving 
these problems, thus allowing participants to communicate with each other more 
comfortably. 



7 Conclusions 

We have described the implementation and experimental evaluation of our proposed 
system designed to support the dynamic formation of human relationships between 
strangers in networked virtual space. Our results show that by placing users in virtual 
settings that are “aware” of their conversation topics, we can create a compelling and 
powerful communication paradigm that promises to raise the quality of the resulting 
human relationship. Our system supports a more natural symbiosis of humans and 
information networks. In the future we will strive to realize a rich multimedia com- 
munication environment further supporting this natural symbiosis. 
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Abstract. Long Short-Term Memory (LSTM) recurrent neural net- 
works (RNNs) are local in space and time and closely related to a biolog- 
ical model of memory in the prefrontal cortex. Not only are they more 
biologically plausible than previous artificial RNNs, they also outper- 
formed them on many artificially generated sequential processing tasks. 
This encouraged us to apply LSTM to more realistic problems, such as 
the recognition of spoken digits. Without any modification of the un- 
derlying algorithm, we achieved results comparable to state-of-the-art 
Hidden Markov Model (HMM) based recognisers on both the TIDIGITS 
and TI46 speech corpora. We conclude that LSTM should be further 
investigated as a biologically plausible basis for a bottom- up, neural net- 
based approach to speech recognition. 



1 Introduction 

Identifying and understanding speech is an inherently temporal task. Not only 
the waveforms of individual phones, but also their duration, their ordering, and 
the delays between them all convey vital information to the human ear. While 
neural networks have dealt very successfully with certain temporal problems, 
they have so far been unable to fully accommodate the range and precision of 
time scales required for continuous speech recognition. This failing has left them 
a peripheral role in current speech technology. 

The aim of this paper is to re-examine the neural network approach to speech 
recognition (SR). In particular, we are interested in providing a more robust, and 
biologically plausible alternative to statistical learning methods such as HMMs. 
In Section 2, a summary is given of the problems that the approach has suffered 
from in the past. In Section 3, the network architecture we will use (Long Short 
Term Memory, or LSTM) is introduced, and its suitability for SR is discussed. 
In Section 4, experimental results for LSTM on a digit recognition task are 
provided. Concluding remarks and future directions are presented in Section 5, 
and pseudocode for the LSTM training algorithm is given in Section 5. 

2 Neural Nets in Speech Recognition 

For neural nets, dealing with time dependent inputs (such as those present in 
speech) means one of two things: either collecting the inputs into time windows. 
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and treating the task as spatial; or using recurrent connections and an internal 
state to model time directly. 

There are two drawbacks to the application of time-windowed networks to 
speech recognition. Firstly, the window size is fixed by the topology of the net- 
work (and usually limited by speed and memory considerations). This means 
that either the net has a huge number of inputs (and therefore a huge num- 
ber of parameters and a very long training time), or else that important long 
time dependencies, such as the position of a word in a sentence, are simply ig- 
nored. Secondly, such nets are indexible with regard to temporal displacements 
or changes in the rate of input (non-linear time warping), leaving them easily 
confused by variations in speech rate. 

With recurrent neural nets (RNNs), on the other hand, temporal patterns 
are not transformed into spatial ones. Instead, a time series is presented one 
frame at a time, with the flow of activations around the connections creating a 
memory of previous inputs. Recurrent training algorithms such as Backpropaga- 
tion Through Time (BPTT)[1,2] and Real Time Recurrent Learning (RTRL)[3] 
can perform weight updates based on the entire history of the network’s states. 
Therefore it seems feasible that they could process any length of time series. 
But in practice, these algorithms share a common weakness: their backprop- 
agated errors either explode or decay in time, preventing them from learning 
dependencies of more than a few timesteps in length: [4] . 

The difficulties indicated above help to explain why Hidden Markov Models 
(HMMs), rather than neural nets, have become the core technology in speech 
recognition [5] . At first sight this is surprising, since their central premise is that 
the future behaviour of the system depends only on its current state (for example, 
that the probability of a phoneme depends only on which phoneme was before it) . 
Moreover, they assume that observations (e.g. frames of speech) are statistically 
independent, which makes it difficult to model such effects as coarticulation (the 
blurring together of adjacent speech sounds). In fact, handling coarticulation and 
other contextual effects has been the most effective use of RNNs in the HMM 
framework. Work by Bourlard [6] and Robinson [7] showed that using RNNs to 
estimate output probabilities for HMMs (the hybrid approach) gave substantially 
improved performance. 

However, the hybrid approach represents an ad hoc combination of top-down 
linguistic modelling and bottom-up acoustic modelling. We feel that a consistent, 
bottom-up approach could be made to speech recognition by an RNN that could 
overcome such problems as long time dependencies and temporal distortions. In 
the following, we aim to test this claim with the Long Short Term Memory 
(LSTM) architecture first described in [8] and later extended in [9]. 

3 The LSTM Architecture 

LSTM is an RNN that uses self-connected unbounded internal memory cells 
protected by nonlinear multiplicative gates. Error is back-propagated through 
the network in such a way that exponential decay is avoided. The unbounded 
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NET OUTPUT 




NET INPUT 



Fig. 1. LSTM memory block with one cell. The internal state of the cell is maintained 
with a recurrent connection of weight 1.0. The three gates collect activations from both 
inside and outside the block, and control the cell via multiplicative units (small circles). 
The input and output gates effectively scale the input and output of the cell while the 
forget gate scales the internal state — for example by resetting it to 0 (making it forget). 



(i.e. unsquashed) cells are used by the network to store information over long 
time durations. The gates are used to aid in controlling the flow of information 
through the internal states. The cells are organized into memory blocks, each hav- 
ing an input gate that allows a block to selectively ignore incoming activations, 
an output gate that allows a block to selectively take itself offline, shielding it 
from error, and a forget gate that allows cells to selectively empty their memory 
contents. Note that each memory block can contain several memory cells. See 
Figure 1. Each gate has its own activation in the range [0, 1]. By using gradient 
descent to optimise weighted connections into gates as well as cells, an LSTM 
network can learn to control information flow. LSTM’s learning algorithm is lo- 
cal in space and time with computational complexity per timestep and weight 
of 0(1) for standard topologies (see Section 5 for details). This locality, in con- 
trast with training algorithms such as Real Time Recurrent Learning and Back 
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Propagation Through Time, makes LSTM more biologically plausible than most 
RNN architectures. Indeed, a recent report by O’Reilly [10] describes a closely 
related model of working memory in the basal ganglia and prefrontal cortex. 

3.1 Why Use LSTM for Speech Recognition? 

As mentioned in Section 2, it is essential for an RNN used in speech recognition 
to be able to bridge long time lags, and adapt to time warped data. These are 
two areas in which LSTM has outperformed other RNN’s. That LSTM can deal 
with long time lags has been demonstrated in experiments such as [8,11], while 
its utility with time-warping is clear from its success in learning context free 
languages [11], and in generating music [12]. In both cases, its advantage comes 
from the fact that because its central timing mechanism is not (as for most 
RNNs) a decaying flow of recurrent activation. Instead, its memory cells act as 
a set of independent counters. These cells (along with the gates used to open, 
close and reset them) allow LSTM to extract and store information at a very 
wide range of timescales. 

However, HMMs, rather than RNNs, are the standard tool for speech recog- 
nition, and the question we must ask is why use LSTM instead of HMMs? 
The answer is that statistical models like HMMs tend to be less general and 
less robust than RNNs, as also less neurologically plausible. For example, the 
parameters and language models used in HMMs are tuned towards particular 
datasets, and the choice of acoustic model they use is dependent on the size of 
the corpus. Also HMMs are very sensitive to channel errors, and coding algo- 
rithms are needed to clean up the data before they see it. LSTM on the other 
hand, is a general purpose algorithm for extracting statistical regularities from 
noisy time series. Unlike HMMs, it doesn’t rely on the manual introduction of 
linguistic and acoustic models, but can learn its own internal models directly 
from the data. And although (like all neural nets) it does have free parameters, 
such as learning rate and layer size, we demonstrate below that these do not 
need to be adjusted for particular corpora. 

4 Experiments 

Two datasets were used in the following experiments. The first was a subset 
(containing only the single digit utterances) of the TIDIGITS corpus, collected 
by Texas Instruments from more than 300 adults and children. The second was 
a set of 500 randomly selected spoken digits from the TI46 corpus. The task on 
the TI46 data was to correctly identify ten digits “zero”, “one”, “two”,..., “nine”, 
while on the TIDIGITS data there was the additional digit “oh”. Unlike some 
experiments found in the literature, we did not separate the adult speakers from 
the children in TIDIGITS. In both cases the audio files were preprocessed with 
mel-frequency cepstrum coefficient (MFGG) analysis, using the HTK toolkit [13], 
with the following parameters: 12 cepstral coefficients, 1 energy coefficient, and 
13 first derivatives, giving 26 coefficients in total. The frame size was 15 ms and 
the input window was 25 ms. 
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4.1 Experimental Setup 

We used a neural net with a mix of LSTM and sigmoidal units (with a range 
of [0,1]). The net had 26 inputs (one for each MFCC coefficient) and 11 (10) 
sigmoidal output units for TIDIGITS (TI46) - one for each digit. The classifica- 
tion was based on the most active output layer at the end of an input sequence 
(i.e. after each spoken digit). A cross-entropy objective function was used, and 
the output layer had a gain of 3. The network also had two hidden layers. The 
first of these was an extended LSTM layer with forget gates and peephole con- 
nections (see section 3 for details). The layer contained 20 memory blocks, each 
with two cells, containing 100 cells in total. The squashing function was logis- 
tic with range [—2,2], and the activation functions of the gates were logistic 
in range [0, 1]. The bias weights to the LSTM forget (input and output) gates 
were initialised blockwise with positive (negative) values of -1-0.5 (—0.5) for the 
first block, -1-1.5 (-1.5) for the second block and so on. The second hidden layer 
consisted of 10 sigmoidal units. In total there were 121 units (excluding inputs) 
and 7791 weights. 

Most of these parameters are standard and have been used in all our LSTM 
experiments. The biasing of the LSTM gates (as used in [9]) ensures that the 
input and output gates are initially open and the forget gate is initially closed. 
The staggering in this bias causes the blocks to become active sequence proces- 
sors one after another, which seems to aid in the subdivision of tasks between 
separate blocks. A smaller LSTM net, with only 10 memory blocks, was found 
to perform less well on this task; experiments with even larger nets failed to give 
any further improvement. The use of a squashing function with range [—2, 2] for 
the cells is also standard, and is helpful in that it allows the stored cell values to 
step down as well as up. The inclusion of the extra sigmoidal layer and the gain 
at the output layer facilitate classification tasks, as they tend to sharpen the 
network outputs towards one or zero. Cross entropy was used as the objective 
function because of its known affinity for identifying mutually exclusive classes 
[14]. 

The connection scheme was as follows: all units were biased, except for the 
input units. The input layer was fully connected to the LSTM layer. The LSTM 
layer was fully connected to itself, the hidden sigmoidal layer, and the output 
layer (note that the LSTM layer had only outputs from its cells, and not from 
its gates). The second hidden layer was fully connected to the output layer. 

The learning rate was 10“® and online learning was used, with weight updates 
at every timestep. The momentum algorithm from [15] was used with a value of 
0.9, and the network activations were reset to zero after each pattern presenta- 
tion. These parameters were experimentally determined, although we have not 
deviated from them significantly in any of our LSTM speech experiments. Errors 
were fed back on every timestep, encouraging the net to make correct decisions 
as early as possible (a useful property for real time applications). Gaussian noise 
was injected into the training data to prevent overfitting. 
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4.2 Results 

With the above setup, we achieved an error rate of 1.1% on the TIDIGITS data. 
The same network, with unaltered weights, achieved an initial error rate of 42% 
on the TI46 data, dropping to 0.5% after only six minutes of additional training. 
Restarting with randomised weights, we achieved an error rate of 2% on the 
TI46 data. Note that our results were actually better with the net previously 
trained on a different corpus, suggesting that an LSTM based recogniser could 
be well suited to incremental learning. 

A recent paper [16] gives an error rate of 2.1% on the TIDIGITS corpus with 
a state-of-the-art HMM based recogniser, when the data was coded to make it 
robust to channel errors. With two other, less robust, coding schemes, errors of 
0.7% and 0.4% were recorded. On the TI46 database, an error rate of 0.5% was 
recorded with a similar system. Although the best of these results are better 
than those achieved with LSTM, it should be pointed out that the HMM sys- 
tems were heavily tuned towards individual databases, that they have to suffer 
a drop in accuracy to become more robust to noise (to which neural nets are 
always robust), and that 20 years of research and development, incorporating 
knowledge from linguists and statisticians, have gone into achieving these fig- 
ures. For LSTM on the other hand, no specific adjustments have been made to 
improve its performance on speech. Furthermore, the kind of incremental learn- 
ing demonstrated above would be impossible with an HMM-based system, which 
would require complete retraining to switch from one corpus to another. 



5 Conclusions and Future Work 

The failure of traditional artificial RNNs in speech recognition is at least partly 
due to their problems with long time lags between relevant input signals. How- 
ever, these problems that have been overcome by the LSTM, which is also more 
biologically plausible than traditional RNNs such as BPTT and RTRL, as its 
learning algorithm is entirely local in space and time, and even related to a 
model of the prefrontal cortex. 

Previous work on LSTM has focused on artificially generated sequence pro- 
cessing tasks. Here, for the first time, we applied it to the task of spoken digit 
recognition, using data from the TIDIGITS and TI46 speech corpora. With no 
tailoring of the basic LSTM setup towards speech recognition, we obtained im- 
pressive results, already comparable to those recorded with specially constructed 
HMM-based systems that have some 20 years of research and development be- 
hind them. 

We are confident that LSTM has the potential to perform well on more com- 
plex continuous speech recognition tasks. We intend to extend our research by 
applying LSTM to automatic syllable and phone segmentation - one of the most 
challenging problems in text to speech applications - and by using articulatory 
features for word and phone level identification. 
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Appendix: The LSTM Algorithm 

In the following we give detailed pseudocode for a single training step in the 
learning algorithm of extended LSTM (LSTM with forget gates and peephole 
connections). See [9] for more information on extended LSTM. 
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As with other Backpropagation algorithms, each training step contains two 
phases: the forward pass and the backward pass. In the case of LSTM the back- 
ward pass (where an error signal at the output layer is propagated backwards 
through the net) is only carried out if error is injected (i.e. if a target is pre- 
sented) . However, the calculation of the partial derivatives required for the weight 
updates must be carried out on every timestep, regardless of pattern presenta- 
tion (hence this step is included in the forward pass). Note also that weight 
updates can be executed at any time - e.g. after every time step (online learn- 
ing) or after after every complete pass through the training set (batch learning) . 
All the experiments in this paper used online learning. 



Pseudocode 



Notation j indexes memory blocks and v indexes memory cells in block j. c 
identifies a particular cell and Sc identifies the internal state of that cell; so cj is 
cell V in block j and Sc" is its internal state. The memory cell squashing function 
is denoted g\ in our experiments g was a logistic sigmoid with range [—2, 2]. The 
synaptic weight from unit a to unit b is denoted Wba and previous activations 
(from one timestep ago) are marked 1 dS is the set of all partial derivatives 
used for weight updates, neta denotes network (unsquashed) inputs to unit a. 
inj, outj, and ipj are respectively the input gate, output gate and forget gate in 
block j; likewise /in^, foutj and f^. are the squashing functions of these gates, all 
of which (for all blocks) are logistic sigmoids with range [0, 1] for the purposes of 
this paper. The activation of a generic unit a is denoted y“. During the weight 
updates, Awba is the change applied to the weight from unit a to unit b. a is 
the network learning rate. 



Initialise network: 

states: Sc"=Sc”=0; partial derivatives: dS = 0', activations: y = y = 0-, 

Forward pass: 

input units: y— current external input; 

begin new timestep: activations: y = y, cell states: Sc"=Sc"j 
loop over memory blocks, indexed j { 

input gates 






= Er 



Will 



2/m + 



Vinj — 



forget gates 



‘'Vj 



(fjm 



■ l^vU W, 






y<fij — 



cell states 



loop over the Sj cells in block j, indexed v { 

Zc" = J2mWc"m Vm] Sc" = y^^ Sc" + Vin^ g{Zc"); } 

output gate activation 

'^outj ^^cn^outjm Vm T ^outj ct ScV^ Voutj foutji^Zoutj^i 



cell outputs 
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loop over the Sj cells in block j, indexed v { t/c" = Voutj Sc»; } 
} end loop over memory blocks 

output UUltS . Z/j; = Vm^ Vk ~ fki^k^l 

partial derivatives: 

loop over memory blocks, indexed j { 

loop over the Sj cells in block j, indexed v { 

ds^v 

cells , {dSl^ := g:^): 



dSi^ = dSl^ +g'{zc«) Vinj Vm] 

ds^v 

input gates , {dSH^ := , dS^^c-' 

d ' j 

dS-lrn = Vvj + (^inj Vra] 







loop over peephole connections from all cells, indexed v' { 
= dS.^^v' Vvj +g{zc-) fL(zinj) sf; } 



forget gates , {dS^Z^ := 



9sc 



dw 






, dS^\, := 









dw 









ds^^Z = ds^^z y^. 



+ Sc- fLiZvj) Vm] 



c“ JipjV^Vj) 

loop over peephole connections from all cells, indexed v' { 
} end loops over cells and memory blocks 



Backward pass (if error injected): 

error injected into ouput units (indexed k): Ck = tk — yk 
Ss of output units : Sk = fk(zk) Cfc 

i5s of non LSTM units connected to outputs : Si = fl(zi) ZUk'^^ki dk) 
loop over memory blocks, indexed j { 

Ss of output gates : 

^outj = foutjiZontj) Sc- 'Yhk^kc- 

internal state error : 

loop over the Sj cells in block j, indexed v { 

S-s^v = J/outj (^k '^kc- sZj ; } 

} end loop over memory blocks 

weight updates: 

output units : Aw km = a Sk ym'i 

loop over memory blocks, indexed j { 

output gates : 

AWcyct^m — Ck (iout ijm] AWccLt.c- — Ck (5out Sc-] 

3 3 
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input gates : 

^Win,m = a 






loop over peephole connections from all cells, indexed v' 



Aw- 



= aEvii 






} 



{ 



forget gates : 

= e-s^-u 

j 



loop over peephole connections from all cells, indexed v' 



Aw^^v' = a Y,vU 



dS^\ 



} 



{ 



cells : 

loop over the Sj cells in block j, indexed v { 
Awc«-m = a dSl^; }; 

^ j 

} end loop over memory blocks 
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Abstract. This paper discusses Tangible User Interfaces (TUIs) and their po- 
tential impact on cognitive assessment and cognitive training. We believe that 
TUIs, and particularly a subset that we dub spatial TUIs, can extend human 
computer interaction beyond some of its current limitations. Spatial TUIs ex- 
ploit human innate spatial and tactile ability in an intuitive and direct manner, 
affording interaction paradigms that are practically impossible using current 
interface technology. As proof-of-concept we examine implementations in the 
field of cognitive assessment and training. In this paper we use Cognitive 
Cubes, a novel TUI we developed, as an applied test bed for our beliefs, pre- 
senting promising experimental results for cognitive assessment of spatial abil- 
ity, and possibly for training purposes. 



1 Introduction 

We are exploring a new breed of interface between human and computers, spatial 
Tangible User Interfaces (TUIs). Our research efforts are geared towards demon- 
strating that spatial TUIs are practical for resolving meaningful real-life problems, 
while providing considerable benefits over existing solutions and revealing new pos- 
sibilities that were not viable without them. Spatial TUIs are still in an infant state of 
development, and can afford only a simple level of expressiveness. The search for 
meaningful and significant spatial TUIs applications thus becomes quite challenging. 
We chose to apply spatial TUIs in the cognitive assessment domain. On one hand we 
identified a need for spatial, tangible interaction with computers for cognitive assess- 
ments. On the other hand, we found that high level of spatial expressiveness (that is, 
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the required level of detail, miniaturization and spatial flexibility) is not required 
during cognitive assessment interactions and can even be seen as a drawback. 

One of the vehicles we developed for our proof-of-concept is Cognitive Cubes — a 
specialized spatial TUI for cognitive assessment of constructional and spatial ability. 
The Cognitive Cubes theme follows a simple assessment model: show participants a 
virtual 3D (Three-Dimensional) prototype and ask them to reconstruct it physically 
with a spatial TUI. The prototype presented to the participants is an abstract 3D geo- 
metrical shape, constructed of generic-looking building blocks. The TUI consists of a 
set of identical physical building blocks affording 3D construction, much like Lego’^’^ 
blocks. 

The cognitive abilities we measure, namely spatial and constructional abilities, are 
skills essential for independent living. Their assessment is an important practical and 
clinical diagnostic tool and is also indispensable in scientific study of cognitive brain 
functions. Techniques for assessment include asking patients or participants to per- 
form purely cognitive tasks such as mental rotation, as well as constructional tasks 
involving arrangement of blocks and puzzle pieces into a target configuration. These 
constructional tasks have the advantage of probing the ability to perceive, plan, and 
act in the world. Studies suggest that assessment with 3D forms of these tasks may be 
most demanding and sensitive. However, use of 3D tasks in assessment has been 
limited by their inherent complexity, which requires considerable examiner training, 
time and effort if scoring is to be consistent and reliable. 

With Cognitive Cubes we demonstrated the first automatic tool for simple, reliable 
and consistent 3D constructional ability assessment. In this paper we discuss several 
interdisciplinary aspects of Cognitive Cubes, a fusion of concepts extracted from 
Human Computer Interaction (HCI) and cognitive assessment. We follow by pre- 
senting Cognitive Cubes and detailing the experimental work performed with the 
system. We emphasize our comparative analysis of Cognitive Cubes and the paper- 
based Mental Rotation Test (MRT), and the tentative results indicating unexpected 
improvement in the MRT results after training with Cognitive Cubes. 



2 Spatial Tangible User Interfaces 

HCI research is a multi-disciplinary effort attempting to lower the barriers between 
people and computers. A key facet of HCI research is directed at the exploitation of 
our innate tactile and spatial abilities. A successful outcome of this effort is the com- 
mon mouse. However, human computer interfaces still largely fail to capture the full 
capacity of our innate ability to manipulate tangible objects. This can be simply dem- 
onstrated by comparing the ease of moving, manipulating, assembling and disassem- 
bling physical objects with the difficulties of performing similar tasks in a 3D com- 
puter-based virtual world using the WIMP (Windows-Icon-Menu-Pointer) interface. 
Recently, the notion of a tangible user interface (TUI) has emerged [17], suggesting 
more elaborate use of physical objects as computer interfaces. Ullmer and Ishii, from 
the Tangible Media Group at the MIT Media Lab, define TUIs as “devices that give 
physical form to digital information, employing physical artifacts as representations 
and controls of the computational data” [17]. TUIs make sense since they engage our 
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natural talents for handling every-day objects in the physical world. We believe Tills’ 
uniqueness lies in their spatiality; following this line of thought we define Spatial 
TUIs as tangible user interfaces used to mediate interaction with shape, space and 
structure in the virtual domain. 

Based on previous research, we introduce simple explanations to what makes our 
tangible-physical interaction seem natural: the support of intuitive spatial mappings, 
the support of unification of input and output (or I/O unification) and the support of 
trial-and-error actions. 

1. Spatial Mapping. We define spatial mapping as the relationship between the ob- 
ject’s spatial characteristics and the way it is being used. The physical world usu- 
ally offers clear spatial mapping between objects and their functions (see also [8]). 
In the digital world most user interaction techniques, particularly in 3D modeling, 
include a set of rules and controls that manipulate their functionality. However, 
these rules and controls, implemented with the restrictions of the WIMP interface, 
are far from enabling an intuitive spatial mapping between interface and applica- 
tion. 

2. I/O Unification. Interaction with physical objects naturally unifies input and out- 
put. We see two components to this unification: (1) the clarity of state; the state of 
physical interfaces in the physical world is usually clearly represented. In HCI this 
is not a given and the state of the application and the interface do not necessarily 
mirror each other. (2) the coupling of action and perception space; when we inter- 
act with an object in the physical world, our hands and fingers (parts of our action 
space) coincide, in time and space, with the position of the object we are handling 
(part of our perception space) [1,2]. This spatial and temporal coupling of percep- 
tion space and action space focuses attention at one time and place, and enables us 
to perform complex tasks. Yet the WIMP interaction paradigm separates mouse 
from screen, input from display, and action from perception. This requires the user 
to divide her attention, and mentally map one space to the other. 

3. Support of “Trial-and-error” Actions. When we build a physical 3D model, we 
actually perform an activity that is both cognitive, or goal related, and motorized 
[1,2]. Such a physical task involves both pragmatic and epistemic actions [1,2,4]. 
Pragmatic actions can be defined as the straightforward maneuvers we perform in 
order to bring the 3D shape closer to our cognitive goal. Epistemic (or “trial and 
error”) actions, on the other hand, use the physical setting in order to improve our 
cognitive understanding of the problem. Some of these epistemic maneuvers will 
fail and will not bring us any closer to our goal, while others will reveal new in- 
formation and directions leading to it. In fact, this information might have been 
very hard to find without trial-and-error [4]. The WIMP interface is geared towards 
pragmatic actions, with poor support for epistemic actions. For example, the 
“undo” operation is linear, meaning that to “undo” a single erroneous operation, 
you have to also “undo” all the operations that followed it. 

We employ these three simple observations of the physical world as heuristics for 
designing effective TUIs. 
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3 On Cognitive Assessment and Technology 

When addressing the question of intelligence, Edwin Boring said in 1923 that “intel- 
ligence is what the tests test” [10]. Similarly, any assessment of human cognition is 
shaped, and limited, by the tools it employs. Examining the technological component 
of cognitive assessment techniques reveals a stagnant state. In 1997 Robert Sternberg 
highlighted that intelligence testing had changed very little over the last century, 
making little use of the powerful technology presently available [11,13,16]. 

Many research endeavors are targeting this deficiency. Since cognitive assessment is 
all about attempting to have a glimpse of human cognition, state of the art HCI tech- 
nology should have a dramatic impact on the field. VR, a far-reaching HCI paradigm, 
is already being exploited as a research test bed for a number of novel cognitive as- 
sessments [10-13]. We see TUIs playing an important role in pushing forward the 
field of cognitive assessment. 

Cognitive assessment is a scientific attempt to study cognition and measure human 
behavior [7]. Testing human behavior involves giving the participant an opportunity 
to “behave” and measuring it. A measurement tool should be reliable (yielding the 
same results consistently on different occasions) and valid (measure what it is sup- 
posed to measure) [10]. Obviously the measurement tool should be sensitive, safe and 
should offer the assessor full control over the data collection process [13]. 

Allowing the participant “to behave” involves the presentation of stimuli which trig- 
ger recordable reactions by the participant. Arguably, many classic, paper-pencil 
cognitive assessment tests offer very limited stimuli, little freedom to behave and low 
ecological validly (that is, little relevance to normal, everyday human behavior in the 
real world) [10]. 



3.1 Automation of Cognitive Assessment 

Most major psychological paper-pencil tests have been automated or are expected to 
be automated in the near future [3]. The immediate advantage of this kind of automa- 
tion is the saving in professional’s time: the computer tirelessly samples the partici- 
pant actions and reliably stores, and refers to a vast assessment knowledge, dramati- 
cally reducing the expertise requirements from the assessor. Other obvious advan- 
tages of automation are an extremely high density of measurement, an elimination of 
tester bias and a potential improvement in test reliability. Computerized tests can also 
be sensitive to response latency, and enable questions tailored based on the exami- 
nee’s past answers [3]. Automated assessment has also been criticized with concern 
focused on miscalibration of tests with respect to their written parallels and misuse of 
tests by unqualified examiners. 

We share the view that this kind of straightforward automation portrays merely the tip 
of the iceberg for automation, and that much of the naysayers’ arguments against 
automation are based on tradition rather than on scientific vision. Robert Sternberg 
suggested automation- supported “dynamic assessment”, where tests targeting learning 
offer guided performance feedback to the participant [13,16]. Major efforts address 
the potential of VR for cognitive assessment. VR-based cognitive assessment should 
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afford all the obvious benefits of automation, particularly almost ideal assessment 
reliability [10,13], 

Albert Rizzo and his colleagues (and others) promote VR-based cognitive assessment 
as a breakthrough in the field, enhancing assessment validity and everyday relevance 
[9,10,12,13], VR-based cognitive assessment can “objectively measure behavior in 
challenging but safe, ecologically valid environments, maintaining experimental 
control over stimulus delivery and measurement” [13], VR-based cognitive assess- 
ment also introduces many new challenges. One largely unaddressed need is the 
analysis of huge number of measurements the automated tools extract (“drowning in 
data” [10]), compared to the simplistic measures of traditional assessment (commonly 
a single time-to-completion measure per task). 



3.2 Probing Constructional Ability 

Muriel Lezak defines constructional functions as “perceptual activity that has motor 
response and a spatial component” [6], Constructional ability can be assessed by 
visuoconstructive, spatial tasks that involve assembling, building and drawing. In a 
typical constructional assessment, the participant is presented with a spatial pattern 
and is asked to mimic it by manipulating or assembling physical objects [6], The test 
administrator scores participant performance using measures such as time-to- 
completion and accuracy, or more demanding observations such as order of assembly 
and strategy analysis. As far as we know, none of these tests were ever automated or 
computerized. 

Constructional functions and disorders can be associated with impairments such as 
lesion of the non-speech, right-hemisphere of the brain and early phases of AD (Alz- 
heimer Disease), and can be useful in their assessment [3,6], Constructional function 
assessment based on the assembly of physical tangible objects generates assessment 
tools that are non-verbal, relatively culture-free and can be very sensitive to and se- 
lective for constructional ability alone [3], 2D (Two-Dimensional) and 3D construc- 
tional tasks have been shown to distinguish between different levels of impairment, 
suggesting that the more complex 3D construction tasks might be more sensitive to 
visuoconstructive deficits that were not noticeable on the simpler 2D tasks [6] , 

2D constructional assessments are widely used, WAIS (Wechsler Adults Intelligence 
Scale) contains two 2D physical construction subtests. Block Design and Object As- 
sembly [3,6], In the former, the participant arranges red and white blocks to copy a 
presented pattern. In Object Assembly, the participant solves a 2D puzzle. Measures 
for both tests are based on time and accuracy [3,6], 

3D constructional assessments are far less common. Two examples are [6]: Block 
Model from Hecaen et al, and Three Dimensional Constructional Praxis from Benton 
et al, , In both of these tests the participant tries to match a 3D prototype using 
wooden blocks, and is scored on time and accuracy. The use of Lego blocks was 
suggested for 3D tests [6], but to our knowledge was never implemented. Given the 
complexity of the target shapes in these 3D assessments, manual scoring of even 
simple measures such as accuracy can be very difficult. Manual scoring of denser 
measures such as order and strategy would certainly require a very skilled, trained 
and alert assessor. 
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3.3 Mental Rotation Test 

The Mental Rotation Test (MRT) is a paper-pencil based assessment of the visuospa- 
tial ability to “turn something over in one’s mind” [12,15]. This ability underlies 
many everyday activities, for example, using a map, or some components of driving 
[12]. The MRT is 3D and spatial, but in its common form it does not have physical 
nor constructional components and is purely cognitive. 

The MRT is based on early work by Shepard and Metzler [15] that was further estab- 
lished in Vandenberg and Kuse’s MRT [18]. MRT’s participants are presented with a 
group of five perspective drawings of 3D objects, one of them is the prototype (the 
“criterion”) object and the rest consists of two identical, but rotated objects, and two 
“distractor” objects (mirror images of the prototype or simply different objects). The 
participant is asked to find and mark the two objects that are identical to the prototype 
object [18]. 

It was shown that the time needed to determine whether two MRT perspective draw- 
ings of objects are similar or not is a linear function of the angular difference between 
them [15], suggesting that people perform the MRT tasks mentally as if they were 
physically rotating the objects. The MRT’s almost perfect linear relationship between 
task difficulty and observable human behavior is rare in cognitive assessment; fol- 
lowing this relationship the MRT received considerable attention and was extensively 
researched. The Virtual Reality Spatial Rotation (VRSR) is an automated, VR-based 
derivative of the MRT [12]. In the VRSR participants are asked to manually orient an 
MRT-like object until it is superimposed on a target prototype. The VRSR adds a 
motoric component and enhances the ecological validity of the MRT by presenting 
the task in a highly immersive VR environment and by enabling the participants to 
manipulate the virtual object using a tracked physical prop [12]. 



4 Cognitive Cubes 

Cognitive Cubes was designed as an automated tool for examination of 3D spatial 
constructional ability [14]. Cognitive Cubes makes use of ActiveCube [5], a Lego- 
like tangible user interface for description of 3D shape. With Cognitive Cubes, users 
attempt to construct a target 3D shape, while each change of shape they make is 
automatically recorded and scored for assessment. 

We created Cognitive Cubes closely following our TUI design heuristics (Section 2). 
First and foremost. Cognitive Cubes offers a very intuitive spatial mapping between 
the TUI and the assessment task. Most of the constructional assessment activity is 
performed entirely in the physical domain, using the physical cube-based TUI which, 
much like Lego blocks, naturally affords constructional activity. The assessment task 
involves the presentation of a virtual 3D prototype that the participant attempts to 
physically reconstruct. We kept the virtual prototype in close visual agreement with 
the physical cubes, texturing it with a detailed matching texture, sampled from the 
physical cubes (see Fig. 1 and Fig. 2). 

At first glance. Cognitive Cubes do not offer strong I/O unification because the vir- 
tual prototype is presented separately from the physical interface. This argument 
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would have been true if Cognitive Cubes were used for 3D design, however, Cogni- 
tive Cubes was used for cognitive assessment. The prototype presented to the partici- 
pant is merely a visual representation of the cognitive goal the participant is expected 
to reach, and in this sense the prototype is external to the interaction. A tighter cou- 
pling between the presented prototype and the physical TUI would leave very little 
challenge in the constructional task. We argue that Cognitive Cubes offer good I/O 
unification since the input — actions performed on the 3D cubes, fully coincide with 
the output — virtual 3D shapes registered at the host computer. 

Lastly, Cognitive Cubes, like many other construction sets, offers extremely flexible 
exploration of the design domain and trial-and-error actions. Participants can perform 
actions on the 3D structure in any desired order, undoing their former actions in a 
completely nonlinear fashion (that is, undoing actions in an order that does not follow 
the construction order). 

As far as we know. Cognitive Cubes is the first computerized tool for assessment of 
constructional ability, combining the increased sensitivity of 3D constructional tasks 
with the efficiency, consistency, flexibility and detailed data collection provided by 
automation. 



4.1 Hardware 



To measure 3D constructional abilities we needed an interface that would maintain a 
high level of 3D physical constructional expressiveness while enabling precise real- 
time sensing of the structure’s geometry. We chose ActiveCube [5] as the infrastruc- 
ture to Cognitive Cubes. ActiveCube is arguably the best current example of a spatial 
3D TUI for structural input, enabling real-time, interactive, step-by-step, geometry 
sampling of a 3D structure. 

ActiveCube consists of a set of plastic cubes (5 cm/edge) that can be attached to one 
another using male-female connectors (employing simple clothing-like snaps), form- 
ing both a physical shape and a network topology. Each cube and cube face has a 




Fig. 1. Cognitive Cubes: virtual prototype (left); physical interaction (right) 
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unique ID. A host PC is connected to a special base cube and communicates with the 
small CPUs in each cube through a broadcast mechanism to sense the (dis)connection 
of any cube. Since all cubes have the same size and shape, any topology represents a 
unique collective shape [5]. ActiveCube capabilities include more than 3D geometry 
input. The cubes are equipped with a large variety of input and output devices, sup- 
porting flexible interaction paradigms. Some of the cubes are equipped with ultra- 
sonic, optical (visible and IR), tactile, gyroscopic or temperature sensors. Other cubes 
are equipped with light, audio, motor and vibration actuators [5]. 

To support our constructional ability assessment paradigm, and to allow us to assess 
participants with diverse constructional abilities (we were planning to approach 
young, elderly, and participants with mild Alzheimer disease). Cognitive Cubes 
hardware had to support the following functions: (i) Allow flexible 3D geometry 
input by assembly of physical cubes, (ii) Sample the physical 3D cubes structure in 
real-time, (iii) Allow easy handling of the hardware. Cubes had to be connected to, 
and disconnected from, each other in a straightforward manner. 

To accomplish these requirements. Cognitive Cubes needed only a subset of Ac- 
tiveCube capabilities, namely the interactive 3D geometry inputting. In this sense, 
ActiveCube additional input and output capabilities could well be distracting for 
Cognitive Cubes purposes. We decided to work only with a generic ActiveCube, 
using cubes with the same color and shape, without any of the extra ActiveCube sen- 
sors or actuators. 

To ease the connectivity of the cubes we added a blue stripe on each of the cubes 
faces. To snap the connectors for proper assembly required that the user either match 
the male-female connectors, or match the two blue stripes on the two connecting 
faces (see Fig. 1 and Fig. 2 for Cognitive Cubes appearance). 



4.2 Software 

To assess constructional ability using Cognitive Cubes, we needed to present the 
participant with the prototype she is asked to construct. We chose a virtual display, 
rather than an actual physical model, as our prototype presentation method. While 
virtual displays Impose a certain level of abstraction (the virtual object is not really 
there), they can offer relatively high levels of realism and afford an extremely flexible 
prototype presentation, enabling us to test and edit easily the vocabulary of our pro- 
totype shapes. We chose to project the 3D virtual prototype in front of the participant 
using a monoscopic digital rear projector and a large screen. Viewing all the proto- 
type aspects was achieved by continuously rotating the virtual prototype at a constant 
rate around its vertical axis, providing 3D depth information. The rotating prototype 
also engages the participant in mental rotation and use of memory as the virtual pro- 
totype and the physical cubes orientations match only periodically. After several 
iterations we fixed the prototype rotation at a slow 2.7 revolutions per minute rate. 
Other than the display. Cognitive Cubes software supported interaction with minor 
audio cues: when the participant connects a cube to the structure a distinct chime 
sounds through a speaker. If the participant chooses to disconnect a cube, a different 
chime sounds. 
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During an experiment, the assessor could easily switch between virtual prototypes 
using a simple menu tool. The software did not generate any cues about the precision 
of the physical Cognitive Cubes structure vis-a-vis the virtual prototype. Hence the 
participant worked freely and when satisfied with the match between his construction 
and the prototype, she informed the assessor, who advanced the system to the next 
trial. The assessor could also choose to stop the assessment at any point if, for exam- 
ple, the participant was not making any progress. 

While the participant attempts to reconstruct the virtual prototype Cognitive Cubes 
collects a data vector, containing the following values, for each participant action: (i) 
Event time: in seconds, measured from the time the virtual prototype appeared on the 
display, (ii) Action type: cube connection or disconnection, (iii) Cube location: can be 
viewed as a Cartesian set of coordinates, measured from the base-cube which is lo- 
cated at the origin. 

After assessment the collected data is analyzed offline to calculate the 3D similarity 
between the participant’s structure s and the prototype p. Similarity is calculated for 
each connect or disconnect event. For example, a five step participant assembly will 
result in five different similarity calculations. The equation for similarity is presented 
in Equation (1), where i is an intersection of s and p, and |i|, |j|, and |p| are the number 
of cubes in i, s and p. s is maximized over all possible intersections i produced by 
rotating or translating s. Intuitively speaking, similarity is the number of intersecting 
cubes minus the number of remaining “extra” cubes in the participant’s structure, 
normalized by the number of cubes in the prototype. 

We make the similarity at task completion, calculated as described above, one of our 
four assessment measures. The remaining three are: last connect, the time elapsed 
from the start of the task to the last cube connect or disconnect; derivative, the differ- 
ences between two successively measured similarities in a task divided by the time 
elapsed between those measurements (local “slope” of the similarity function), aver- 
aged for all such pairs in a task; and zero crossings, the number of times the local 
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slope crossed zero. We sometimes use the terms “completion time”, “rate of prog- 
ress”, and “steadiness of progress” as substitutes for last connect, derivative, and zero 
crossings. 



4.3 Testing Cognitive Cubes 

We included in our experimental design four task types (Fig. 2). Intro tasks were 
simple practice trials, designed to introduce the participant to Cognitive Cubes. A 
cube appears on the display after each new connection, indicating the next cube to 
attach. The follow task type also provided step-by-step guidance, but the tasks were 
much more difficult. Match tasks provided no cube-by-cube guidance, but rather 
displayed a complete virtual prototype for the participant to construct using their own 
approach. In all three of these task types, the starting point for the participant’s con- 
struction was the base cube. With reshape tasks the participant started from a more 
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Intro task 




Follow task 




Statch Task 




Resiiape Task 




Reshape - initial condition shape 



Fig. 2. Cognitive Cubes task types (samples) 

complex initial condition (always the same 7-cube 3D construct) — in all other re- 
spects reshape was exactly like match. 

The participant sat at a table with only Cognitive Cubes placed in front of him. A 125 
cm diagonal image was displayed in front of the viewer at a viewing distance of 1 85 
cm using a digital projector, in a brightly lit room (Fig. 1). 

The experiment was conducted with a strict written protocol read out loud to the 
participant. The participant was introduced to the system, the experiment, and its 
purpose, and then read an information letter. She was told that she might stop the 
experiment at any time, and asked to sign a consent form. The participant was given a 
short interview, answering questions concerning age, education, occupation, experi- 
ence in 3D design, construction sets, computer games, general health and handedness. 



4.4 Results at a Glance 

First, we performed an extensive pilot study which included 14 young, healthy par- 
ticipants who performed the entire cognitive assessment. An important lesson learned 
from the pilot study related to the difficulty of Cognitive Cubes. We found that many 
of our healthy, young participants faced difficulties with tasks that involved a rela- 
tively small number of cubes in a 3D arrangement. For example, several of the pilot 
study participants found a seemingly simple, five-cube follow task quite challenging, 
though eventually manageable. Matching a ten-cube prototype proved to be very 
challenging for several participants. Consequently, we decided that all prototype 
shapes would be restricted to at most ten cubes. 
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To confirm and improve the sensitivity of Cognitive Cubes, we studied its response to 
three factors known to correspond to differences in cognitive ability: participant age 
( < 34, > 54), task type (follow, match and reshape), and shape type (2D, 3D). 

Since cognitive ability declines gradually with increasing age, in this study we ex- 
pected younger participants to perform better than older participants. As the cognitive 
load of a task increased, cognitive abilities are stressed, leading us to expect better 
performance with task types that required less planning. Similarly, we have already 
noted the heavier cognitive demands involved in working with 3D shapes. We antici- 
pated better performance with 2D shapes than with 3D shapes. 

The cognitive sensitivity study included 16 participants, ranging in age from 24 to 86, 
with 4 females and 12 males. 7 of the participants were young, 7 elderly and 2 were 
elderly with mild Alzheimer Disease (AD). 

Fig. 3 provides a view of some of the study results. The figure presents the similarity 
(Equation 1) versus time, for a single Cognitive Cubes task. The task is a seven-cube, 
3D match task. The similarity measure curves are plotted for the 13 cognitive sensi- 
tivity study participants who performed the task. 

It is interesting to note that all the participants who began this task completed it, 
reaching a final similarity of 100%. The total time measure varies considerably be- 
tween the different groups: most of the young participants completed the task faster 
than most of the elderly participants. All of the participants accomplished the task 
more quickly than the single AD participant. 

Participants’ rate of progress, as manifested through the curve slope, or the derivative 
measure, also differs between the groups. Most of the young participants have a 




Time (sec) 



Fig. 3. Similarity vs. time; cognitive sensitivity study; single task 
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steeper similarity slope than the elderly participants. All the participants have a faster 
rate of progress than the AD participant. 

Lastly, with the exception of the AD participant, all participants have steady progress 
towards the goal, and thus no zero crossings (Section 4.2). However, the AD partici- 
pant similarity curve local slope crosses zero several times. 

We analyzed the results using one 3-factor (2 age x 3 task type x 2 shape type) 
ANOVA for each of the last connect, similarity, zero crossings, and derivative meas- 
ures. Because there were so few AD participants, we excluded them from any analy- 
ses of variance. We also exclude the intro task type from the analyses. 

All three factors produced main effects in line with our expectations. Participant per- 
formance varies significantly by age, with elderly participants needing more time to 
complete each task, and showing a low rate of progress. By all four dependent meas- 
ures, participant performance is also significantly affected by shape type. 2D shape 
construction is completed more quickly, more accurately, and with a higher and 
steadier rate of progress. Finally, task type also has significant effects on all four 
measures. Follow is the easiest of the task types, enabling quick completion and a 
high, steady rate of progress toward the target shape. However, shape similarity is 
lowest with the follow task. Participants perform the match and reshape tasks with 
roughly equal completion times and similarities, but the rate of progress in the match 
task type is higher and steadier. A thorough discussion of our cognitive sensitivity 
study results is left out of the scope of this paper, and is presented elsewhere [14]. 



5 Cognitive Cubes and the Mental Rotation Test (MRT) 

Having studied the sensitivity of Cognitive Cubes to factors related to cognitive per- 
formance, we turn to a comparison of Cognitive Cubes to a known tool for 3D spatial 
assessment: the MRT. As we discussed in Section 3.3 the MRT has 3D and spatial 
components, like Cognitive Cubes, leading us to expect a strong relationship between 
the two assessments, particularly with 3D tasks. However, since the MRT does not 
include any of Cognitive Cubes’ constructional, planning, or motor task components, 
we might anticipate the relationship to be limited to simpler tasks such as follow. 



5.1 Participants and Methods 

The test comparison study’s 12 participants had ages ranging from 18-36, with an 
average age of 27.66 and standard deviation of 5.61 years. Four of the participants 
were females and eight were males. Participants were all volunteers recruited on and 
off campus none of whom participated in any other phases of the Cognitive Cubes 
experiments. The procedure followed the general Cognitive Cubes experimental 
methodology except that participants took the MRT test before and after the Cogni- 
tive Cubes assessment. We refer to these MRT sessions as Pre-Cognitive Cubes, and 
Post-Cognitive Cubes, respectively. 
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5.2 Results 

We start this section with a few selected comments from Cognitive Cubes partici- 
pants, based on post-session interviews. We tried to include negative feedback as well 
as positive, though most experiences were very positive. 

Male, healthy, 34 — “Cognitive Cubes were easier and more fun than the MRT, since 
they are tactile there’s less to think about”. 

Female, healthy, 28 — “MRT is more challenging than Cognitive Cubes. With Cogni- 
tive cubes you can always ‘try’, with the MRT you have to use only your imagina- 
tion”. 

Female, healthy, 29 — “MRT is stressful, feels like an exam. Cognitive Cubes is fun, 
you feel you are doing something. Less stress and there’s no time limit”. 

We performed our MRT/Cognitive Cubes comparisons using correlations. As will be 
discussed later in this Section, our post Cognitive Cubes MRT reached ceiling, 
meaning that most of the subjects scored very high MRT scores after their Cognitive 
Cubes session. Because they reached ceiling and lost sensitivity, correlating of post- 
Cognitive Cubes MRT to Cognitive Cubes measures would be a meaningless exercise 
and are not presented. 

The correlations of pre-Cognitive Cubes MRT and Cognitive Cubes are presented in 
Table 1, along with the probability that the correlations are not significantly different 
from 0. Those correlations with high probability of being different from 0 are pre- 
sented in bold, underlined digits. The measure with the most significant overall cor- 
relation (and the only reaching marginal significance) is the derivative measure. Cor- 
relations to zero crossings are low. Correlations to similarity are also low, perhaps 
because similarities are uniformly high. Correlations to last connect are also high. 
Correlations are only slightly stronger for 3D than 2D shapes, while correlations are 
strongest with follow tasks, slightly weaker with match tasks, and completely un- 
trustworthy with reshape tasks. 

Contrary to our expectations, both 2D and 3D task types produce some good correla- 
tions to the 3D MRT. We believe this may well be attributable to task difficulty. 
While the MRT asks the user to perform a small set of relatively simple 3D mental 
rotations. Cognitive Cubes challenges participants to construct a single shape, which 
may be small or large, 2D or 3D. Which is more like the MRT; building from scratch 
a complex 3D shape, or a simple 2D shape? The answer is unclear, and thus the lack 
of clarity in the shape type correlations. 



Table 1. Pre-CC MRT/Cognitive Cubes correlations. Correlation significance: (p<.l) in bold, 
(p<.05) underlined 



Dependent 

Measures 


Overall 


Shape Type 


Task Type 


2D 


3D 


follow 


match 


reshape 


Last connect 


- 0.38 


-0.49 


- 0.35 


-0.63 


- 0.35 


- 0.24 


Similarity 


0.03 


- 0.36 


0.17 


0.16 


- 0.09 


0.08 


Zero crossings 


- 0.23 


0.07 


- 0.25 


- 0.14 


- 0.45 


0.11 


Derivative 


0.51 


0.38 


0.57 


0.43 


0.50 


0.34 
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Fig. 4. Post-Cognitive Cubes vs. pre-Cognitive Cubes MRT Scores 



5.3 Surprises 

We compared MRT results obtained before (pre-Cognitive Cubes) and after (post- 
Cognitive Cubes) the Cognitive Cubes assessment. Interestingly, post-Cognitive 
Cubes MRTs are markedly improved (most in the 90th percentile). The improvement 
in the MRT score after the Cognitive Cubes sessions is illustrated in Fig. 4. Each 
graph point represents MRT results from a single participant, its X axis value the pre- 
Cognitive Cubes MRT score, and its Y axis value the post-Cognitive Cubes MRT 
score. 

While it is well known that repeating the MRT brings improved performance, im- 
provements are in this case well above the normally reported repetition improvement 
rate of roughly 5%. Although these results are preliminary we believe they indicate 
some of the potential TUIs have as intuitive and automatic training aids. 



6 Conclusion 

Is Cognitive Cubes a useful tool for assessment? Our experience certainly indicates 
great promise. Despite being a prototype, the ActiveCube hardware component stood 
up well to intense use and proved to be quite intuitive for our participants. In experi- 
mental evaluation, the system as a whole was sensitive to well-known cognitive fac- 
tors and compared favorably to an existing assessment. Automation introduced a 
previously unachievable level of reliability and resolution in 3D measurement and 
scoring. Despite all this. Cognitive Cubes is not yet ready for regular use. 

How might Cognitive Cubes be prepared for use in the field? The gap between a good 
prototype and a reliable tool is a large one. Use in clinical or research settings would 
require significant improvements in cost, reduction of connection and system errors. 
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and improvements in structural strength. These are fairly typical requirements for the 
development of any technology. In addition, extensive testing would be required to 
identify the distribution of scores typically achieved with Cognitive Cubes. In this 
way, assessors can reliably decide whether or not a score indicates impairment. 

One the most unique strengths of Cognitive Cubes is its ability to capture each step of 
the task progress — closely mirroring the cognitive processing of the participant. With 
the same data used to build similarity graphs it is also possible to build decision trees 
reflecting the participant’s chosen path through the space of possible cube-by-cube 
construction sequences. This dynamic process can be probed even more deeply by 
attempting to categorize participant trees according to cognitive ability. 

The improvement from pre-Cognitive Cubes MRT to post-Cognitive Cubes MRT 
results is unexpected, but very intriguing. Could Cognitive Cubes be used as a form 
of cognitive therapy or training, for example in rehabilitation? Our current results are 
preliminary, but we see a great promise for further probing in this direction. 

Cognitive Cubes, to our knowledge, is the first system for the automated assessment 
of 3D spatial and constructional ability. Cognitive Cubes makes use of ActiveCube, a 
3D spatial TUI, for describing 3D shape. Cognitive Cubes offers improved sensitivity 
and reliability in assessment of cognitive ability and ultimately, reduced cost. Our 
experimental evaluation with 43 participants confirms the sensitivity and reliability of 
the system. 

We see Cognitive Cubes as a proof-of-concept demonstrating our research goal, 
showing that a specialized spatial TUI closely tied to an application can offer sub- 
stantial benefits over existing solutions and suggests completely new methodologies 
for approaching the applied problem. 

We also see Cognitive Cubes as a practical and successful example of our TUI design 
heuristics being put to work. Our choices during the design of Cognitive Cubes, for 
example, selecting a spatial application domain, and choosing a very intuitive spatial 
mapping between the interface and the task, were all closely guided by our TUI heu- 
ristics. We believe that the success of Cognitive Cubes should also be attributed to the 
heuristics that guided the design. 
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Abstract. There have been many researches in Computer Science that their 
fundamental ideas were based on Biology. Genetic algorithm and neural net- 
work are best-known paradigms in this category. Recently, many ideas from 
immune system have been used in detecting computer virus and worm. Since 
the first computer virus has been found, scanning detection has been used as a 
primarily method in virus detection systems. As computer viruses and worms 
become more complex and sophisticated, the scanning detection method is no 
longer able to detect various forms of viruses and worms effectively. Many 
anti-virus researchers proposed various detection methods including artificial 
immune system to cope with these characteristics of computer viruses and 
worms. This paper discusses the principle of artificial immune system and pro- 
poses artificial immune based virus detection system that can detect unknown 
viruses. 



1 Introduction 

Since the computer vims first appeared in 1981, it has been evolved continuously as 
computer environment such as operating system has advanced. The frequent evolution 
of computer virus has lead anti-virus researchers to continuous development of new 
detection method. But most anti-virus systems are still based on scanning detection 
using signature, since other mechanisms have high false positive rate or detection 
speed problem in real situations [11]. Recently, several anti-virus researches focuses 
on biologically inspired system such as negative selection method as novel vims de- 
tection system [6,11]. 

In this paper, we propose a new computer vims detection approach that employs 
the artificial immune system. Artificial immune system is based on human immune 
system. The human immune system recognizes, memorizes, and responses to virus 
intmsion. Anomaly detection and adaptability are important properties of artificial 
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immune system. The proposed artificial immune based system exploits these proper- 
ties and detects unknown computer viruses. 

In the next chapter, we survey on human immune system. Artificial Immune sys- 
tem will be discussed in detail in Chapter 3 and Chapter 4. In Chapter 5, the proposed 
artificial immune based virus detection system will be described. Simulation results 
and the direction of future research will be discussed in Chapter 6 and Chapter 7 re- 
spectively. 



2 Human Immune System 

Human immune system (HIS) protects body from pathogen like virus using distrib- 
uted immune cells. HIS is a complex system that consists of cells, molecules and or- 
gans. Elements of HIS cooperate with each other to identify “self’ (own cells) and 
“non-self’ (foreign cells or pathogens). When pathogens enter the body, they are de- 
tected and eliminated by HIS. HIS also remembers each infection type and pattern. If 
the same pathogen enters the body again, HIS copes with it more effectively than the 
first time. These processes are performed in distributed environment. There are two 
inter-related systems in the HIS: the innate immune system and the adaptive immune 
system [23]. 

The body is born with the innate immune system. The innate immune system has 
the ability to recognize some patterns of particular pathogens. The innate immune 
system is based on a set of receptors known as Pattern Recognition Receptors (PRRs). 
One receptor matches with particular molecular pattern of pathogen, called Pathogen 
Associated Molecular Patterns (PAMPs). The PAMPs originate from only foreign 
pathogens, not the body cells. Therefore, recognition by the PRRs means that there 
are some pathogens. This mechanism is based on the boost of adaptive immune sys- 
tem, and it gives a capability of distinguishing between self and non-self. 

When innate immune system recognizes certain pathogens, it bursts stimulatory 
signal that will lead to T cell activation. T cell has receptors called antibody, and acti- 
vating T cells is the start of the adaptive immune system. Activated T cell stimulates 
B cell, and then B cell proliferates and differentiates into non-dividing terminal cell 
and antibodies. Finally, antibodies neutralize recognized pathogens. While activated 
B cells secrete antibodies, T cells do not secrete antibodies, rather T cells play a cen- 
tral role in the control of the B cell response in adaptive immune system. 

Negative selection is another element of adaptive immune recognition that does not 
need the innate immune recognition. Negative selection is an activity selecting re- 
ceptor that does not detect self. Receptors have distinct protein structure, and protein 
structures are formed by a pseudo-random genetic process called negative selection. 
According to the molecular structure of a receptor, the receptor detects a particular 
cell that was indicated by its molecular structure called antigens. If pathogen intrudes 
human body, immune cells that have receptor matched for them are able to detect and 
neutralize them. In this paper, we will propose artificial immune system based on 
negative selection algorithms. The system generates detectors that never matched with 
normal programs. Detectors are classified into virus detectors and other things. Classi- 
fication of virus detector based on clonal selection of HIS. 
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Clonal selection supports the classification ability of HIS. Although receptors of 
immune cells are generated randomly, the adaptive immune system maintains useful - 
detectable current intrusion - immune cells among many of them. When receptors of 
immune cells and antigens of pathogens are matched, T cells produce clones. This 
process results expansion of the corresponding T cells, so that adaptive immune sys- 
tem responds rapidly to the same pathogens. The adaptive immune system generates 
immune cells randomly, and then it selects detectable one. Expanding the detectable 
one, human immune system response rapidly. This process is called clonal selection. 

HIS is a very complex system for protecting human body against harmful patho- 
gen. The main purpose of HIS is to classify all cells into self or non-self. To do this, 
HIS learns and memorize useful antibody through negative selection and clonal se- 
lection in distributed environment. These mechanisms give inspiration in several 
computer science areas such as pattern classification, distributed architecture solution, 
and anomaly detection. In next chapter, we will enumerate several biological immune 
based systems that are used in computer security. They are called artificial immune 
system. 




Fig. 1. Taxonomy of artificial immune system 



3 Artificial Immune System in Computer Security 

Artificial immune system is one of the biological inspired systems such as neural 
network and genetic algorithm. Artificial immune system is based on several capa- 
bilities of biological immune system. In computer security, artificial immune system 
should have anomaly detection capability to defend unknown intrusion. Adaptability 
is also a necessary property of artificial immune system that can learn unknown intru- 
sion and respond to learned intrusions quickly. Other properties such as distributabil- 
ity, autonomy, diversity and disposability are also required to flexibility and stability 
for artificial immune system [14]. 
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Artificial immune system can be divided into three categories according to which 
computer component will be identified as a protected cell as in Figure 1 . 

First, protecting static file category assumes a file including program code as a cell. 
Forrest proposed a method for change detection using negative selection algorithm 
[6]. Forrest’s artificial immune system randomly generates check code called detec- 
tor, and eliminates ones that can recognize self (benign program). This process is 
called negative selection algorithm. All programs or data matched with detectors are 
indicated as non-self (malicious code). Jeffrey O. Kephart proposed an artificial im- 
mune model that consists of monitoring, scanning and biological immunology [11]. In 
his model, known viruses are detected by scanner based on weak signature, and un- 
known viruses are collected and analyzed using decoy program. The decoy program is 
dummy program that is not changeable. If decoy program is modified, it is virus in- 
fection. The decoy program is a trap to collect unknown viruses. Distributed Com- 
puter Virus Immune System (DCVIS) combined negative selection and decoy pro- 
gram. It uses decoy program to detect the unknown virus and employs negative 
selection to identify detected unknown virus [16]. 

Second, protecting process category assumes a process as a cell. Matching the 
system call sequence is proposed by Forrest [10]. The mechanism of this is the same 
as self-nonself discrimination using negative selection. It is different that this mecha- 
nism matches detector with system call sequences, not with program codes. 

Third, protecting a network category assumes a host as a cell. Kill-signal mecha- 
nism is proposed by Kephart, and NAVEX is immune system that is proposed by Sy- 
mantec [11,17]. Both immune systems propagate signatures of detected viruses to 
protect hosts connected via network like biological immune system. In biological 
immune system, when an immune cell detects a virus, the cell spread chemical signal 
in order to duplicate the same immune cells. In network intrusion detection system, 
Williams et al. proposed Computer Defense Immune System (GDIS). It is proposed to 
protect local network using artificial immune system. GDIS is a distributed system 
with redundant links and decentralized control. These features provide several capa- 
bilities such as fault tolerance and no-single-point-failure that can be found in bio- 
logical immune system [22]. 

The proposed artificial immune based virus detection system is included in the first 
category. It extracts suspicious code detector from all the suspicious codes based on 
negative selection mechanism of biological immune system. Then, it distinguishes vi- 
rus detector from extracted detectors. Proposed detection system is described in detail 
in Ghapter 5. 



4 Computer Viruses Detection and Artificial Immune System 

Immunologists have traditionally described the human immune system as the problem 
of distinguishing "self" from dangerous "non-self" and eliminating dangerous non- 
self. Similarly, the computer security can be viewed as the problem of distinguishing 
benign program from malicious program such as computer virus or worm. Eugene H. 
Spafford describes a virus from a biological point of view, and shows a several char- 
acteristic of virus as artificial life [1]. Since then, many anti-virus researchers have re- 
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searched on artificial immune system to protect a system from the intrusion including 
viruses and worms. 

The characteristics of current viruses are polymorphic and fast spreading. These 
characteristics impose several problems on current anti-virus system. First, it is hard 
to extract virus signature from polymorphic viruses, since the virus changes its struc- 
ture in every replication. Second, although we extract the virus signature, if viruses 
spread very quickly using network environment, they already damage many computer 
resources. Therefore, we need innovative detection mechanisms instead of scanning 
method based on virus signature. In this chapter, current virus detection methods and 
some of the artificial immune based virus detection system are briefly discussed. 

Current commercial anti-virus systems are virus scanners. Virus scanners look for 
virus signature in memory or in program files. Once suspicious code is found, scan- 
ners alert to the user that the system is infected. Virus scanner is very fast, but man- 
aging huge size of signatures for virus scanning is getting more and more difficult. 
And also, virus scanner cannot detect unknown and polymorphic virus. 

Many anti-virus system use heuristic algorithms in order to detect polymorphic vi- 
ruses. Since most polymorphic virus technique is insertion and mutation, heuristic al- 
gorithm detects polymorphic virus by using frequency analysis of particular strings or 
codes. Although anti-virus system has heuristic algorithm that can detect some poly- 
morphic viruses, it requires specific heuristic algorithm for each polymorphic virus. 
Moreover, recent complex polymorphic virus - encrypted polymorphic virus - cannot 
be detected with scanners equipped heuristic algorithm. Contrary to insertion and 
mutation, encryption technique generates new code that does not match particular 
code at all [3] [4]. 

Change detector is a detection method using integrity check. Change detector scans 
the executable program files and records information about the files before the system 
initiates. The change detector periodically scans the program files to check the integ- 
rity of files. If there is difference between recorded and current check-codes, a virus 
could have caused the changes. 

Recent change detector using negative selection algorithm proposed by Stephanie 
Forrest is different from the conventional ones such as MD5, CRC [6]. Forrest’s 
model maintains the signature that does not match any of the protected files. This sig- 
nature is called “detector”. The system monitors the protected files by comparing 
them with the detectors. If a detector is matched any file, a change is known to have 
occurred. The algorithm for generating the detector is computationally expensive, but 
checking is cheap. It would be difficult to modify file and then alter the corresponding 
detector. 

Related research using simple and efficient definition for abnormal behavior was 
proposed by Forrest [10]. Forrest collects normal behavior as short sequences of sys- 
tem call. Next, abnormal behaviors are generated using negative selection algorithm. 
The negative selection algorithm generates detectors that recognize abnormal behav- 
ior. In this case, the detector is a set of system call sequences of abnormal program. 
This approach could detect malicious program if the system call sequence of moni- 
tored program are the same as that of detector. 

The most important characteristics of current computer viruses are polymorphic 
and spread quickly, whereas existing anti-virus systems are not flexible enough to 
cope with the property of recent computer virus. The negative selection algorithm that 
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supports the anomaly detection and distributed environment can be a good candidate 
for designing effective polymorphic virus detection system. It also can protect a net- 
work from fast-spreading virus. 



5 Proposed Artificial Immune Based Virus Detection System 

Negative selection algorithm and decoy programs are important methods in artificial 
immune system. Negative selection algorithm is inspired by genetic random-process. 
But this mechanism has weakness that come from the difference between human body 
and computer system. State of human body is stable, whereas that of computer system 
is not. Negative selection algorithm is suitable in stable system. Idea of decoy pro- 
gram comes from the distributability and diversity of immune system. In the case of 
decoy program, there is no guarantee that a virus will attack a decoy program. The 
proposed Artificial Immune based Virus Detection System (AIVDS) compromises 
this weakness, and is suitable for dynamically changing environment such as com- 
puter system. 




Virus Scanning Signatures 



Fig. 2. Structure of proposed vims detection system 

VDS is a signature learning system to detect unknown viruses. AIVDS classifies 
incoming or changed (patched) programs into legitimate programs and viruses. The 
process of AIVDS is consists of the following three steps. In the first step, AIVDS as- 
sumes that all existing programs are legitimate. In the next step, all incoming and 
changed programs are classified into suspicious programs. Finally, AIVDS selects vi- 
rus programs from these suspicious programs using detection method based on virus 
behavior. Hereafter, we refer to legitimate program as self and suspicious program as 
non- self. 

AIVDS consists of signature extractor and signature selector as in Figure 2. Sig- 
nature extractor produces signatures of non-self. The main operation of signature ex- 
tractor is selecting bit strings that are not matched with any self code at the same po- 
sition. Therefore, signature extractor produces signature of non-self that never 
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matched with any self. Produced non-self signatures are collected and analyzed by 
signature selector. 

Signature selector compares the similarity of non-self signature with each others. 
Since viruses tend to infect other programs, if the same non-self signatures are ap- 
peared frequently, signature selector classifies those into signatures of viruses, and 
these signatures are used for virus scanner. Notice that these non-self signatures never 
match with any self program. In the case of less frequent signatures, it is assumed as 
signature of self. We classify the program that contains any one of these self signa- 
tures as self. This process provides the adaptable capacity to the proposed AIVDS. 

The detailed process of each steps are discussed in the following 3 sections. 



5.1 Signature Representation 

The AIVDS generates signatures to recognize self and non-self program. The size of 
signature extraction region, from which we analyze for signature extraction, is always 
constant. Since most infected file executes virus code at first, most entry pointer of in- 
fected program indicates virus code. Therefore, we decide that starting point of sig- 
nature extraction region is entry point of program. Signature consists of several pairs 
of offset and bit string. Number of pairs, offset and bit string are selected by signature 
extractor. 



program 

entry 

point 




Porgram 



Fig. 3. Signature representation 



5.2 Signature Extractor 

Signature extractor produces non-self program signature that never matches with any 

self-program signature. This process consists of the following steps. 

• Step 1: Divide a signature extraction region into several same sized comparison 
units. 

• Step 2: Compare a signature extraction region of non-self program with each one 
of all self-programs. 

• Step 3: If two comparison units on same position are same, discard the comparison 
unit in signature extraction region of non-self program. Continue this process on 
each signature extraction region of all self-programs. 
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non-self 

selfl 

updated 

non-self 

self2 



mark 




signature 



Fig. 4. Generating signature of non-self program 

In Figure 4, comparison unit is 1 byte. Non-self is compared with selfl and self2. 
When two comparisons are finished, first unit and fourth unit are marked. After all the 
comparisons with other self-programs are done, the signature of non-self consists of 
remained unmatched units. Since all of the comparison units that are the same with 
self’s are marked as useless, generated signature never matches with any self- 
programs. 



5.3 Signature Selector 

After we obtain signatures of non-self programs, we need to classify non-self program 
into virus or normal program. According to behavior of viruses, we can decide 
whether the signature indicates virus or not. Viruses tend to infect other programs. If 
virus infects other programs, the signature extractor generates signatures from same 
virus code because infected program is changed by one virus. Therefore, checking 
frequency of occurrence of the same signature is the same as checking the spread of 
the same virus code. 

Signature selector calculates the similarity values of non-self signatures, as shown 
in Figure 5. Comparison factors are bit sequence and offset of comparison unit in sig- 
nature extraction region. If two factors of comparison units are equal, similarity func- 
tion adds one to similarity value. When consecutive comparison units are equal, 
similarity value is higher. For example, if two compare units are equal and adjacent, 
similarity value is 3 (ll 2 ). In other words, when n consecutive comparison units are 
equal, similarity value is 2"-l. 

Note that signatures of the same programs are more similar than signatures of dif- 
ferent programs. Therefore, similarity value between signatures of same program 
codes is higher than the other. Threshold values for classifying the same and different 
programs are determined by analyzing similarity values of entire non-self programs. 
Similarity value of the same programs is relatively high. 
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Comparison value = 1 + 11 = 1 + 3 = 4 
Fig. 5. Similarity between two signatures 



6 Simulation 

AIVDS extracts signatures of non-self programs based on self-programs. Signature 
size and similarity value is important factor in AIVDS. As the number of self- 
programs that has same comparison units at same position is increased, size of a sig- 
nature of the non-self program is decreased. In the worst case, AIVDS cannot gener- 
ate the signature of particular program, because every comparison units are marked as 
useless. The similarity value is used for classifying virus detector from suspicious 
signatures. In this chapter, we will show simulation result from different signature 
extraction region and comparison unit. The simulation is processed on several pa- 
rameters as in Table. 1. 



Table 1. 


Simulation parameters 


Parameters 


Variables 


The number of self-programs 


1385 execution files 


The number of non-self programs 


160 execution files (3 virus infected files) 


Signature extraction region size 


500Byte, 1Kbyte, 5Kbyte, 10Kbyte 


Comparison unit size 


IByte, 2Byte, 3Byte 



In Figure 6, graphs show the relation between signature size and two parameters; 
signature extraction region and comparison unit. The more signature extraction region 
and comparison unit increases, the more signature size increases. Larger signature in- 
cludes richer information about related non-self program. But, when signature is used 
for scanning viruses, small signature is effective. Signatures that are larger than 1KB 
are not feasible for a virus-scanning signature. Moreover, the percentage of files that 
has zero signature is independent with size of signature extraction region and com- 
parison unit. The number of files with zero signature size is almost 14 (8.75%) in all 
case. Therefore, we chose two effective sizes of signature extraction region; 1Kbyte 
and 500 Byte. We simulated comparison for signature selection about the extracted 
signatures of non-self programs on these two parameters. 
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Fig. 6. Simulating using various sizes of SER: Graphs show signature size of Signature Extrac- 
tion Region (SER) 10Kbyte, 5Kbyte, 1Kbyte, and SOOByte. Each graph has three parameters of 
comparison unit IByte, 2Byte, and 3Byte. 

Similarity values between each signature of non-self programs are shown in Figure 
7. When signature extraction region is 1Kbyte and comparison unit is IByte, 88% of 
signatures of non-self programs have similarity value zero. When signature extraction 
region is SOOByte and comparison unit is IByte, signatures who similarity value is 
zero are 92% in entire signatures of non-self programs. Since virus infected file is 
1.875%, ideal percentage of signatures that similarity value is zero should be 
98.125%. Then, we can classify non-self signatures into virus signatures, whose 
similarity value is greater than zero. But, the percentage of non-zero signatures is 6% 
even though the extraction region size is 1Kbyte and comparison unit size is 3 Byte. 
We need to determine threshold value to classify non-self programs into normal pro- 
grams and viruses. 

When signature extraction region is the same size, the more comparison unit is in- 
crease, the more similarity value is increase rapidly, because consecutive extracted 
signature is larger. When larger comparison unit is used, the gap of similarity value is 
larger. When signature extraction region is 1Kbyte and comparison unit is 3Byte, we 
can find easily threshold of similarity value 1 .E-t08 to classify signatures of non-self 
programs into virus, because ideal percentage of signatures of normal programs is 
98.125%. When threshold of similarity value is l.E-t08, three signatures are selected 
by signature selector. Virus scanner using these signatures can detect virus-infected 
files. 
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Percentage ofFiles as Similarity ( SER 1KByte ) 




Similarity 



Percentage of Files as Similarity ( SER 500 Byte ) 




Fig. 7. Percentage of files as similarity: In the case of SER 1Kbyte and SOOByte, Percentage of 
files is showed. Each graph has three parameters of comparison unit IByte, 2Byte, and 3Byte. 

Figure 8 shows the behavior of the synthetic vims. The sample of self-programs is 
the same as one of the previous simulation, and the parameters are selected in the 
general case; SER 1Kbyte, comparison unit 3Byte, and threshold value of similarity 
I.Eh- 08. Non-self program is inserted at each stage as in the previous simulation. The 
only difference is that the same synthetic virus inserted at every stage between 100 
and 150 with the probability of 0.3. At the same time, inserted synthetic vims infects 
other self-programs at every stage with the probability of 0.3. During this period, the 
death rate of synthetic vims is 0.33. This figure shows that the number of similar - 
similarity value is higher than threshold value - signatures increase from stage 100 
and decrease from stage 150. After stage 200, the number of similar signatures 
changes between 0 and 5, but it is very small compared with the period of vims 
propagation (between 100 and 200). Therefore, AIVDS is able to detect virus propa- 
gation behavior. 
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Virus propagation behavior 




Simulation stages 



Fig. 8. Vims propagation behavior at the AIVDS. In case of SER 1Kbyte and comparison unit 
3Byte, vims propagation started at the stage of 100, and stopped at the stage of 150. 



7 Conclusion 

Artificial immune system is based on distinguish “self’ and “non-self’ like biological 
immune system. But previous artificial immune system is not feasible for dynamic 
computer system, especially negative selection algorithm. In this paper, we proposed 
the Artificial Immune based Virus Detection System (AIVDS) that is feasible for dy- 
namic computer system. AIVDS is a signature learning system to detect unknown vi- 
rus. AIVDS produces signatures of non-self from suspicious program, and classify 
them into self-program and virus. Since virus tend to infect other programs, if similar 
non-self signatures are appeared frequently, AIVDS classifies those into signature of 
viruses for virus scanning. This process gives the adaptable capacity to AIVDS. 

In the simulation of the proposed AIVDS with 1Kbyte of the Signature Extraction 
Region (SER) and 3Byte of the comparison unit, 94% of extracted signatures were 
completely different, in other words their similarity values are zero. The remaining 
6% signatures including virus signatures had distinguished similarity values. Espe- 
cially, 2% virus signatures had relatively high similarity values. The proposed AIVDS 
classifies suspicious non-self programs into normal programs and viral programs. 
Using threshold of similarity value, AIVDS can select virus signatures. Virus scanner 
using these signatures can detect virus-infected files correctly. It also detects the 
propagation behavior of virus. 
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Abstract. Conventional Difference of Gaussian (DOG) filter is usually 
used to model the early stage of visual processing. However, convolu- 
tion operation used with DOG does not explicitly account for the effects 
of disinhibition. Because of this, complex brightness-contrast (B-C) illu- 
sions such as the White’s effect cannot be explained using DOG filters. 
We discovered that a model based on lateral disinhibition in biological 
retinas allows us to explain subtle B-C illusions. Further, we show that 
a feedforward filter can be derived to achieve this operation in a sin- 
gle pass. The results suggest that contextual effects can be processed 
through recurrent disinhibition. Such a context sensitive structure might 
be employed to improve network robustness of visual capturing or dis- 
playing systems. Another potential application of this algorithm could 
be automatic detection and correction of perceived incoherences where 
accurate perception of intensity level is critical. 



1 Introduction 

Brightness-contrast (B-C) illusions allow us to understand the basic processes in 
the early visual pathway. B-C illusions can become very complex, and a complete 
explanation may have to be based on a multi-stage, multi-channel model, with 
considerations of top-down influences [1,2,3]. In this paper, however, we will focus 
on the very early stages of visual processing, and see how far we can exploit low- 
level mechanisms observed in biological vision systems toward explaining B-C 
illusions. 

For example, the dark illusory spots at the intersections in the Hermann grid 
(Figure lA) are due to lateral inhibition in the retina and the lateral geniculate 
nucleus (LGN) [4]. The visual signal in the eye is generated by the photoreceptor 
cells, and then it is passed through bipolar, horizontal, and amacrine cells and 
Anally goes to LGN. Lateral inhibition is the effect observed in the receptive 
field where the surrounding inhibits the center area. When the stimulus is given 
in the receptive field, the central receptors produce an excitatory signal, while 
the cells in the surrounding area send inhibition through the bipolar cells to the 
central area [5]. (Difference of Gaussian, or DOG, Alter [6] is commonly used to 
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Fig. 1. The Hermann grid illusion. A. The Hermann grid illusion. The intersec- 
tions look darker than the streets. B. The output using a conventional DOG filter. C. 
The plot of brightness level prediction (from B). To measure the average response, we 
took the column-wise sum of rows 27 to 29. Note that the illusory spots (at positions 
pi, p2, and p3) have a brightness value much higher than the periphery. The conven- 
tional operation cannot explain why we perceive the periphery to be brighter than the 
dark illusory spots. 



simulate such a process.) Figure IB and C show such an effect by using DOG 
filters. The plot on the right shows the brightness level of the middle row, and 
the dark illusory spots are clearly visible (pi, p2, and p3). 

However, DOG filters alone cannot account for more complex visual B-G 
illusions. For example in the Hermann grid illusion, although the illusory spots 
get explained pretty well, the conventional DOG model cannot explain why 
the periphery (figure lA, to the left) appears brighter than the illusory spots 
(figure lA, to the right). This output is counter to our perceived experience. 
The reason for this failure is that the center of DOG in the peripheral area 
receives inhibition from all the directions which results in a weaker response than 
the intersections in the grid which only receive inhibition from four directions. 
Moreover, the White’s effect [7] (figure 2A) cannot be explained using the 
conventional DOG filter. As shown in figure 2B, the output using conventional 
DOG filters gives an opposite result: The left gray patch on the black strip has a 
lower output value than the one on the white strip. On the contrary, we perceive 
that the left gray patch on the black strip as brighter than the one on the right. 

Anatomical and physiological observations show that the center-surround 
property in early visual processing may not be strictly feed-forward, and it in- 
volves lateral inhibitions and, moreover, disinhibition. Hartline et al. used Limu- 
lus optical cells (figure 3) to demonstrate lateral inhibition and disinhibition 
effects in the receptive field [8]. (Note that disinhibition has also been found in 
vertebrates retinas such as in tiger salamanders [9] and in mice [10].) Disinhibi- 
tion can effectively reduce the amount of inhibition in the case if we have a large 
area of bright input, which might be the solution to the unsolved visual illusion 
problem. 

In this paper, unlike DOG, we explicitly model disinhibition to derive a filter 
that is able to explain a wider variety of B-G illusions than the conventional 
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Fig. 2. The White’s effect. A. The White’s effect. The gray patch on the left has 
the same gray level as the one on the right, but we perceive the left to be brighter 
than the right. B. The output using a conventional DOG filter. C. The brightness level 
of the two gray patches calculated using conventional DOG filter. As in the previous 
figure, we added up rows of 10 to 19 in the output to get the average response. Note 
that the left patch has a lower average value (below zero) than the right patch (above 
zero). The result contradicts our perceived brightness. 



DOG filters. In the following, we first review the model of Hartline et al. [8, 
11,12], and introduce our model which is called the Inversed DOG model (or 
IDOG) and show how it is derived. The next section shows the results to various 
illusions. Finally, the issues raised by our model is discussed, followed by the 
conclusion. 



2 Hartline-Ratliff ’s Model of Disinhibition 

Experiments on Limulus optical cells showed that the disinhibition effect is re- 
current (figure 3). The final response of a specific neuron can be considered as 
the overall effect of the response from itself and from all other neurons. Gon- 
ventional convolution operation using the DOG filter does not account for the 
effect of disinhibition which plays an important role in the final response. The 
final response of each receptor resulting from a light stimulus can be enhanced 
or reduced due to the interactions through inhibition from its neighbors, which 
may be important. It turns out that this effect can help solve some unsolved 
problems of B-G illusions, thus, it may be important to explicitly account for 
disinhibition. 

The Hartline-Ratliff equation describing disinhibition in Limulus can be sum- 
marized as follows [8,11,12]: 

^ ^ im-i—n) ( 1 ) 

where Vm is the response. Kg is the self-inhibition constant, Cm is excitation of 
the TO-th ommatidium, is the inhibitory weight from other ommatidium, 

and tjn-^n the threshold. 

Brodie et al. extended this equation to derive a spatiotemporal filter, where 
the input was assumed to be a sinusoidal grating [13]. This model is perfect in 
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Fig. 3. Lateral inhibition in Limnlus optical cells (Redrawn from [8]). The 

figure shows the disinhibition effect in Limulus optical cells. A. The retina of Limulus. 
Point light is presented to three locations (1, 2 and 3). B. The result of lighting position 
1 and 2. The top trace shows the spike train of the neuron at 1, and the two bars below 
show the duration of stimulation to cell 1 and 2. When position 2 is excited, the neuron 
response of position 1 gets inhibited. C. Both 1 and 2 are illuminated, and after a short 
time, position 3 is lighted. The top two traces show the spike trains of cell 1 and cell 
2. The three bars below are input duration to the three cells. As demonstrated in 
the figure, when position 3 is lighted, neurons at position 2 get inhibited by 3, so its 
ability to inhibit others get reduced. As a result, the firing rate of neuron at position 
1 gets increased during the time neuron at position 3 is excited. This effect is called 
disinhibition. 



predicting Limulus retina experiments as only a single spatial frequency chan- 
nel filter, which means that only a fixed spatial frequency input is allowed [13]. 
Because of this reason, their model cannot be applied to a complex image (e.g., 
visual illusions such as the Hermann grid illusion), as various spatial frequen- 
cies could coexist in the input. In the following section, we will build upon the 
Hartline-Ratliff equation and derive a filter that can address these issues. 

3 Simplified Disinhibition Model Using Single Matrix 
Inverse Operation: The IDOG Model 

Rearranging equation 1 and generalizing to n inputs, the responses of n cells 
can be expressed in a simple matrix form as below by assuming the threshold 
and self-inhibitory constant to be zero (in this paper, we only care for spatial 
properties of visual illusion, so the assumption of zero self-inhibition rate is 
reasonable): 

Kr = e, (2) 

where r is the output vector, e is the input vector and K is the weight matrix: 
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The above ID model can be easily extended to 2D by serialization. We can 
serialize the input and output to ID vectors to fit in the ID model we have. The 
weight matrix K can be fitted into 2D by assigning the weight Kij from neuron 
j to neuron i as the classic two-mechanism DOG distribution [6]: 



f -w{\i,j\) when i yf j 




K.j = l , 


( 4 ) 


1 1 when i = j 




w{x) = DOG(a;) = 


( 5 ) 



where |i,j| is the Euclidean distance between neuron i and j, kc and kg are 
the scaling constants that determine the relative scale of the excitatory and 
inhibitory distributions, and ctc and Ug their widths. 

The response vector r can finally be derived from equation 2 as follows, and 
we can apply inverse serialization operation to get vector r back into 2D format: 

r = K-^e. (6) 

Figure 4 shows a single row (corresponding to a neuron in the center) of the 
weight matrix K, plotted in 2D. The plot shows that the neuron in the center can 
be influenced by the inputs from locations far away, outside of its own receptive 
field. 




ABC 



Fig. 4. An Inversed DoG filter. The filter (i.e., the connection weights) of the 
central neuron is shown in log scale. A. A 2D plot of the filter. B. A 3D mesh plot 
of the filter. C. The plot of the central row of the filter. Note the multiple concentric 
rippling tails. 



4 Results 

In this section, we will test our IDOG model first with three Limulus cells and 
then with several B-G illusions (Hermann grid, the White’s effect, and Mach 
band). Based on these experiments, we will demonstrate that disinhibition does 
play an important role in early visual processing. 
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4.1 Disinhibition in ID: A Model of the Limulus Retinal Cells 

Reconsidering the Limulus experiments, let us suppose three Limulus cells have 
the same input, say 100. We assigned the weights based on the distance tu(l) = 
—0.5 and w{2) = —0.1, which indicates that if the cells are near neighbors, their 
inhibition effect is 50%, while if they are remote neighbors, the effect is reduced 
to 10%. The response r is then calculated as follows: 

r 1 0.5 0.1] r 100] r83.333' 

r = K~^e= 0.5 1 0.5 x 100 = 16.667 . 

0.1 0.5 1 J l_100j [83.333 

If we increase the input a little bit (5%) to the neuron ei, the result becomes 
different as shown below: 

■ 1 0.5 0.1] r 105] r90.227‘ 

r = K"^e= 0.5 1 0.5 x 100 = 12.500 . 

0.1 0.5 1 J [lOoJ [84.722 

The third neuron increased the response from 83.333 to 84.722, since the second 
neuron was further inhibited by the first neuron (the response decreased from 
16.667 to 12.5000), which had its input increased from 100 to 105. This result 
matches the experimental results from Hartline et al. [8], clearly demonstrating 
the disinhibition effect in our model. 

4.2 Disinhibition in 2D: The Hermann Grid Illusion 

In the Hermann grid, the illusory spots can be modeled quite well using conven- 
tional DOG filters. However, conventional DOG filters cannot explain why the 
periphery area appears brighter than the dark illusory spots. Gonvolving with 
conventional DOG filters results in more inhibition to the white peripheral area 
than the intersections in the grid, because the periphery gets inhibition from all 
radial directions while the intersection only get inhibition from four directions. 

Our IDOG filter which explicitly models disinhibition provides a plausible 
explanation to this problem. Figure 5 shows the result of applying our filter to 
the Hermann grid image: G is the plot of the middle row of the filter response in 
B. The periphery is indeed brighter than the dark illusory spots, showing that 
disinhibition (and hence IDOG) can account for the perceived brightness in this 
particular example. 

4.3 Disinhibition in 2D: The White’s Effect 

The White’s effect [7] is shown in figure 6A: The gray patch on the black vertical 
strip appears brighter than the gray patch on the right. As shown in figure 2, 
DOG cannot explain this illusion. However, disinhibition plays an important role 
in this illusion: While the gray patch on the black strip receives inhibition from 
the two surrounding white strips, compared to the gray patch on the right side, 
disinhibition is relatively stronger. Because of this, the gray patch on the right 
side appears darker than the left side patch (G in figure 6). 
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Fig. 5. The Hermann grid illusion and prediction. A. Part of the Hermann 
grid which we used to test the response of the periphery and the illusory spots. B. 
The output response of IDOG. C. The prediction using the IDOG filter (from B). The 
illusory spots are at position pi, p2 and p3, which have a brightness value lower than 
the periphery. (The curve shows the column-wise sum of rows 27 to 29.) 




A B c 

Fig. 6. The White’s effect and prediction. A. The White’s effect stimulus. B. 
The output using IDOG. C. The prediction using the IDOG model (from B). The gray 
patch on the left results in a higher value than the right patch. (The curve shows the 
column- wise sum of rows 11 to 19.) 



4.4 The Mach Band 



Comparing with the conventional DOG filter, one advantage of the IDOG model 
is that it preserves the different level of brightness as well as enhances the con- 
trast at the edge. As demonstrated in figure 7, the four shades of gray are clearly 
separated using IDOG. These different shades are not preserved using a conven- 
tional DOG filter. Note that this can be simply because the sum of the DOG 
matrix equals zero, and scaling up kc in equation 5 can correct the problem. How- 
ever, there is one subtle point not captured in the conventional DOG approach: 
the wrinkle (figure 7E) near the Mach bands observed in Limulus experiments 
[14]. Compared to the IDOG result, we can clearly see that this wrinkle is absent 
in the DOG output (figure 7C) . 
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Fig. 7. The Mach band. A. The Mach band input image. B. The output using 
a conventional DOG filter. The different brightness levels are not preserved. C. An 
expanded view of the inset in B. D. The output using IDOG. The different brightness 
levels are preserved E. An expanded view of the inset in D. 



5 Discussion and Future Work 

We have shown that by explicitly modeling disinhibition, we can more accurately 
explain various B-C illusions. Although there are many other improved DOG 
filter models, such as the oriented DOG (ODOG) filter proposed by Blakeslee 
and McGourt [15], they still cannot (under our analysis) explain certain problems 
like the phenomenon related to the periphery area of the Hermann grid (figure 1). 

Our model is strongly motivated by biological facts as well as computational 
considerations. First, experimental evidence shows that the inhibition in the reti- 
nal receptive fields can be explained by the isotropic amacrine and horizontal 
cells. Second, we utilize the classical two-mechanism DOG distribution. Third, as 
the experiments demonstrated by Hartline and colleagues using Limulus cells, 
disinhibition is a natural effect of recurrent lateral inhibition, which does not 
work well with a single-pass convolution operation. Another interesting obser- 
vation is that the IDOG filter has a similar shape as the circular Gabor filter 
[16]. Gircular Gabor filters have been successfully used in rotation-invariant tex- 
ture discrimination, and it would be interesting to see if IDOG can also be used 
in such a domain. Also, there is further psychophysical evidence [17] suggesting 
that early visual processing can be modeled by filters similar to our disinhibition- 
based IDOG filters. 

One limitation to our approach is that the inverse weight matrix results in 
a non-local operation, thus it can be computationally inefficient. To overcome 
this issue, we can use an approximated algorithm. Based on our observation, the 
IDOG filter usually converges to a value near zero at a distance twice that of 
the DOG-based receptive field. We can use the IDOG filter which is twice the 
original receptive field size and still use a local convolution operation to process 
larger images. 

Potential applications of IDOG algorithm include designing robust visual 
sensor arrays. The disinhibition network model may have the ability to improve 
the robustness of the system. Gonsidering that each individual cell has outputs 
to some other cells while those cells are connected with the input of this cell, 
the disinhibition structure actually forms a control loop for each individuals in 
the network. For each single cell, the rest of the system acts as a regulator. 
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and the context signal received by the rest can be treated as a reference. Thus 
in the case a single cell becomes less stable, the rest of the system can act as a 
regulator to increase the overall stability. This feedback structure may be applied 
to visual sensor arrays. An unstable situation can arise when there is noise in the 
sensors or due to other hardware failures. In such a case, the nearby neighbors of 
the problematic unit can regulate the unstable input by utilizing the neighbors’ 
inputs as a reference which could be automatically achieved by the disinhibited 
network model. 

Another application area is where accurate perception of intensity level is 
critical. For example, perceived incoherences in luminance in video display panels 
can be detected and corrected using IDOG. The concept of disinhibition can also 
be applied to higher brain functions such as categorization and memory (e.g., 
Vogel [18] proposed a model of associative memory based on disinhibition). We 
believe a close analysis of cortical horizontal connections and their physiology 
under the disinhibition framework can provide us with new insights on their 
functions. This in turn will allow us to apply the general concept of disinhibition 
in advanced intelligent systems, firmly based on biological observations. 



6 Conclusion 

We have shown that certain limitations of DOG filters can be overcome by explic- 
itly modeling disinhibition, and that a simple feedforward filter can be derived. 
Using the IDOG filter, we were able to successfully explain several B-G illusions 
that were not sufficiently explained in previous models. Our work also shows 
that complicated recursive effects can be explicitly calculated or approximated 
using a single matrix multiplication. The results suggest that contextual effects 
can be processed through recurrent disinhibition, and a similar analysis may 
be applicable to higher brain functions. Such an analysis will allow us to apply 
disinhibition in building advanced intelligent systems based on a firm grounding 
on biology. 
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Abstract. How can artificial or natural agents autonomously gain un- 
derstanding of its own internal (sensory) state? This is an important 
question not just for physically embodied agents but also for software 
agents in the information technology environment. In this paper, we in- 
vestigate this issue in the context of a simple biologically motivated sen- 
sorimotor agent. We observe and acknowledge, as many other researchers 
do, that action plays a key role in providing meaning to the sensory state. 
However, our approach differs from the others: We propose a new learn- 
ing criterion, that of on-going maintenance of sensory invariance. We 
show that action sequence resulting from reinforcement learning of this 
criterion accurately portrays the property of the input that triggered a 
certain sensory state. This way, the meaning of a sensory state can be 
firmly grounded on the choreographed action which maintains invariance 
in the internal state. 



1 Introduction 

Information technology has been largely driven by the growth in quantity, speed, 
and precision, but less by the improvement in quality and relevance of infor- 
mation. For that reason, while the amount of data being accumulated daily is 
staggering, our understanding of the data is not. The problem boils down to that 
of meaning (or semantics), i.e., what do these data mean and how can software 
systems understand the meaning of the data that they process? In this paper, 
we put this problem in the context of the brain (cf. [1,2]), which is the only 
known device which naturally processes meaning, and find out what could be a 
potential solution. 

The brain is made up of 100 billion neurons [3], which generate a complex 
pattern of activity in response to sensory stimuli from the external environment. 
A fundamental question in brain and cognitive sciences is, how do we under- 
stand what this pattern means? To make the question even simpler, we can ask 
what does a spike of a single neuron mean? [4]. Even this reduced problem is 
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[ Observei) 

(a) External observer 





( Observei) 







(b) Internal observer 



Fig. 1. External vs. Internal Observer. The problem of decoding neural spikes is 
seen from the outside (a), and from the inside (6) of a perceptual agent. The neuron 
shown as a circle inside the box performs an inputl (7) to spike (S) transformation 
using the function / : 7 — >■ S'. The function / is supposed to be a highly complex one, 
and the neuron may be deeply embedded in the agent (i.e., it is not at the immediate 
sensory transduction stage, such as the photo-receptors). The task then is to find out 
the property of input 7 given just the spikes S. 



not trivial, and it took an enormous effort to come to the current state of under- 
standing, beginning from muscle spindles [5] to cat visual cortical neurons [6] to 
sophisticated stimulus reconstruction methods developed lately (see, e.g., [4]). 

A popular approach to this question is through associating the neural spikes 
with the stimulus that triggered those spikes [7] (see [8] for a review). Such 
methods have been successful in characterizing the neural spiking properties and 
accurately predicting the stimulus given just the spike train. This method in- 
volves the experimenter systematically varying the environmental stimulus while 
measuring the neural response (see, e.g., [4] chapter 2), so that at a later time, 
when only the spike train is observed, something can be said about the stimulus 
property. Mathematically, this is conveniently written using the Bayes theorem 
[4] (see Figure la): 



P(IIS) 



P(SII)P(I) 

P(S) 



where I is the input stimulus and S is the spike train. Note that the likelihood 
term P{S\I) requires that we have either an empirical statistics or a reasonable 
model of the stimulus-to-spike translation. Thus, the interpretation of the current 
spike train P{I\S) seems to depend on direct knowledge about the stimulus 
properties, one way or another, which introduces the problem of circularity (cf. 

[ 9 ]). 

Now suppose we ask the same question “what does a single spike mean?” to 
onot/ier neuron in the brain where such spikes are received (Figure 16). Because 
this neuron does not have immediate knowledge about the environmental stimu- 
lus associated with the spike it received nor that of the receptive field properties 
(as it does not have the tools of the experimenter) the neuron cannot apply the 
technique mentioned above. (This problem can also be seen in the context of 
the Bayesian theorist, i.e., not merely an observer; an issue raised by Jepson and 
Feldman [10].) For example, consider a similar situation depicted in figure 2. 
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Inside the agent, the only available information is the sensory array activity, 
so if we are trapped inside this agent without access to the visual environment 
outside, we can never figure out what the sensor activity means. 

This paper begins by realizing this as a genuine issue. Our contention is 
that such a dilemma can be overcome through learning how to associate sen- 
sory activity to the motor actions the brain itself generates. The importance of 
sensorimotor learning has been emphasized in the past by many researchers: 

1. schema theory [11,12]; 

2. learning of sensorimotor contingency [13,14]; 

3. two-level interdependence of perception and action [15]; 

4. ecological perception of affordances [16]; 

5. subsumption architecture in robotics [17]; 

6. learning of natural semantics in robots [18]; 

7. sensory-motor coordination [19,20,21,22] and imitation [23,24] in au- 
tonomous agents; 

8. dynamical systems approach in agent-environment interaction (reviewed in 

[25]); 

9. the role of action in meaning and semantics in a dynamically coupled system 
[1] [2] (pp.13-17); 

10. mirror neurons and imitation in primates [26,27]; 

11. the role of branching thalamic afferents in linking action and perception [28, 
29]; 

12. motor learning enhancing perceptual performance [30]; 

13. sensory substitution through active exploration [31,32]; 

14. fixed action patterns (TAP) and thought as internalized action [33] (pp.l34- 
141); 

15. perception viewed as internal simulation of action [34] (pp.9-11); 

16. the role of action in consciousness [35] (pp. 193-196), all recognize action as 
a key element in intelligent brain function. 

Building upon these, we begin by examining how action can help in the au- 
tonomous discovery of meaning in agents as shown in figure lb. Our problem 
formulation is similar in spirit to Philipona et al. [14] and Pierce and Kuipers 
[21,22], where a sensorimotor agent has to learn about its own raw sensors and 
actuators. The twist is that we provide a simple criterion that can exactly link 
the sensory states and the associated actions in a meaningful way. 

Below, we first define the problem in terms of a sensorimotor agent we intro- 
duced in figure 2, and propose a learning algorithm based on on-going sensory- 
invariance driven motor action. The basic idea is that the agent has knowledge 
about its own movements, and the movements that it generates that reliably 
activate a particular sensor in the sensor array constitute the meaning of that 
sensor’s spike. The acquired meaning for each sensor and the resulting behavioral 
patterns are presented next, followed by discussion and conclusion. 
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Fig. 2. A Sensorimotor Agent. An illustration of a simple sensorimotor agent is 
shown. The agent has a limited visual held where the input from the environment 
is projected. A set of orientation-tuned neurons receive that input and generate a 
pattern of activity in the sensory array (black marks active). In the situation here, the 
45° sensor is turned on by the input. Based on the sensory array pattern, after some 
processing (signihed by “?”), the x and y values of the motor vector is set, resulting 
in the movement of the visual held and a new input is projected to the agent. 



2 Meaning of Sensory State in a Sensorimotor Agent 

To better illustrate our point, let us consider a small, concrete example as shown 
in figure 2, a simple sensorimotor agent. The agent has a limited visual field, 
and the incoming visual signal is transformed via oriented filters (mimicking 
primary visual cortical neurons) into a spike pattern in the sensory array. Let 
us further assume that the agent does not have any knowledge (e.g., about the 
receptive field structure) of its oriented filters. The task of the agent then is to 
attach meaning to its own sensory array activity pattern, i.e., to come to an 
understanding that each sensor represents a certain oriented visual input. 

Imagine we are inside this agent, isolated from the world outside the box, 
sitting near the big “?” sign. It is questionable then whether we can ever be able 
to associate an orientated visual input stimulus with the spikes generated in the 
sensor array because we cannot peek outside, and we do not know the particular 
mechanism of the filters. The spike, in principle, could have been generated from 
any sensory modality, e.g., auditory or tactile input. 

The only way we can see this issue resolved is through action, that is, the 
movement generated by the agent. This point is not entirely obvious at first, so 
let us elaborate a little bit on what we mean. As shown in figure 2, we included 
the capability of action in the agent. The agent is able to gaze at different parts of 
the scene by moving around its visual field. The x and y variables correspond to 
the movement of the visual field in the x and the y direction, respectively. Thus, 
these two variables are like motor commands. We, sitting on that “?” sign, can 
generate different combinations of (x, y) values and observe the changing pattern 
in the sensory array. By relating the sensor activity and the motor command that 
was just generated, certain aspects of the sensor property can be recovered. We 
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believe this is generally agreeable, but it is too general. It begs the question 
of what is that “certain aspects” of the sensory property and how can they be 
learned? 

A crucial insight that occurred to us at this point was that certain kinds of 
action tend to keep the sensory activity pattern to remain unchanged (i.e., in- 
variant) during vigorous movement, and this action exactly reflects the property 
of the sensory stimulus. For example, consider the state of the agent as shown 
in figure 2, where a 45° input is presented, and the corresponding sensor is acti- 
vated in the agent. Now imagine we move the visual held according to the motor 
vectors (1, 1), (1, 1), ..., (1, 1), (—1, —1), (—1, —1), ..., (—1, —1), which corresponds 
to a back-and-forth movement along the 45° diagonal (i.e., aligned on the input). 
Such an action will keep only the 45° sensor turned on during the motor act, 
i.e., the sensory array will stay invariant. We can see that this motion, generated 
while trying to keep the sensor array unchanged, has led the agent to perform 
an act, the property of which reflects that of the stimulus. Thus, we are led 
to conclude that associating this kind of sensory-invariance driven action with 
spikes can potentially serve as a meaning for each sensory neuron.^ 

To test this insight that ascribing meaning to sensory neuron activity is pos- 
sible through learning the sensorimotor association based on sensory-invariance, 
we implemented a learning agent following the description in figure 2. The fol- 
lowing sections describe the learning rule of the agent, followed by the results. 

3 Learning of Sensory-Invariance Driven Action 

Consider the agent described above (figure 2). We define a simple learning rule 
based on our idea of sensory-invariance driven action. The agent has the current 
state of its sensors S (the sensory array), and a set of actions D (possible com- 
binations of the motor vector) that it can perform. For simplicity, we limit the 
sensor state set S to four different values 

5= {0°, 45°, 90°, 135°}, (1) 

which correspond to the four different orientation preference (note that 0° is 
the same as 180° etc.) of the sensors, and the action set D to eight different 
categories 

D = {0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°}, (2) 

which are the possible directions of motion of the visual held with a limited dis- 
tance of movement. Thus, the above corresponds to (1, 0), (1, 1), (0, 1), (—1, 1), ... 
in terms of the motor vectors we mentioned earlier (the motion in x and y were 
either 0 or 1). 

The learning task of the agent can then be treated as a standard reinforce- 
ment learning problem with a Markov assumption (see, e.g., [36,37]). The goal 

^ Note that the invariance of this kind is different from that in Philipona et al. [14] 
where invariance is gained as a result of compensated motion, but not during the 
motion itself. 




Autonomous Acquisition of the Meaning of Sensory States 181 





Fig. 3. Inputs Used for Training and Testing. The agent was trained and tested 
on 51 X 51 bitmap images each containing a 3-pixel wide oriented edge. Four inputs 
with four different orientations are used for the experiments (from the left: 0°, 45°, 
90°, and 135°). 



of the agent is to select an action from the action set D that maintains the sen- 
sory array activity invariant. Thus, the reward is simply the degree of sensory- 
invariance in successive stages of action. More formally, the agent has to learn a 
policy function tt, 

(3) 

at step t which selects a direction of motion dt £ D based on the previous state 
St € S so that the resulting reward rt is maximized. The execution of the policy 
at each state s* results in reward: 



rt = r{st,dt), 



( 4 ) 



based on the reward function r{s,d) for s G S,d € D, and this function is 
updated as follows: 



n+i{s,d) 



rt{s,d) + a* ft a St = St-i, 
rt{s,d) - a* ft if St ^ St_i, 



( 5 ) 



where rt+i is the reward at step t -I- 1; a{= 0.01) is a fixed learning rate; and 
ft is the number of action steps taken by the agent up till t which resulted in 
either (1) continuously maintaining the sensory array to be invariant or (2) the 
opposite (i.e., changing all the time). Thus, if St = St-i was true for the past 
n (= a large number) consecutive steps, then ft = n, and this will increase 
the reward associated with (s, d). On the other hand, n consecutive failures of 
maintaining sensory invariance will also lead to a high ft value, but this time 
the reward for (s,d) will decrease. The reward function is simple but even such 
a simple rule is sufficient for the agent to learn sensorimotor associations. 

In the following, we will present the learned policy tt and the behavior of the 
agent which mimics the input stimulus. 



4 Experiments and Results 

In the learning process the agent interacted continuously with the visual envi- 
ronment in a series of episodes. During each episode, the agent was presented 
with a 51 X 51 bitmap image containing an oriented edge (figure 3). The visual 
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field of the agent was 9x9 which can slide across the image. The visual field 
input was directly compared to each of the four sensory filters (also 9 x 9 in size) 
and the sensory state s was set to a value 6 when there was an exact match with 
one of the four orientations 9 & S (see equation 1) . 

The agent was trained to learn the policy tt : S ^ D using equation 5 by 
going through the four different inputs. Since the size of the state and the action 
sets were [S'] = 4 and \D\ = 8, the policy tt and the associated rewards can be 
enumerated in a 4 x 8 table. At each step, the next direction of motion d G D (see 
equation 2) was determined based on the expected reward values stored in such 
a reward table of the agent. The reward table was initialized to hold uniformly 
distributed random numbers between 0 and 1 normalized by their sum. Also, 
the reward was limited to the range 0 < < 1. Figure 4a-d shows the initial 

reward values where each plot corresponds to a state s G S, where each polar 
plot shows the reward r (distance from origin) for each action d G D (angle) for 
the given state s. 

The training was carried out until the agent was able to learn to maximize 
the reward by consistently meeting the sensory-invariance criterion. The training 
usually lasted for up to 500 steps for each input. The reward table after training 
is visualized in figure 4e-/i. The results clearly show that the agent learned 
to associate motion d which reflects (or mimics) the actual orientation of the 
environmental input to the current sensory state s triggered by that input. For 
example, in figure 4/, the maximum reward values associated with the sensory 
state s = 45° are d = 45° and d = 225°, indicating a preference for a back-and- 
forth movement along the 45°-axis which exactly mimics the oriented property 
of the visual input. The same is true for all other states (figure 4e, g, and h). 

One thing to note from the actual numerical reward values (not shown) is that 
there is a slight difference (<0.01) between reward values for the two opposite 
directions separated by 180° (e.g., d = 45° and d = 225°). The minor difference 
helps the agent to have an initial bias in the selection of the first movement, and 
to maintain a momentum to continuously follow along an orientation instead 
of rapidly oscillating between two opposite directions. Note that this desirable 
effect was not explicitly built in by us, but rather, emerged from the sensory- 
invariance driven learning rule. 

In order to verify if our analysis of the reward table is accurate, the trained 
agent was tested with fixed oriented inputs and the resulting motor behavior 
was observed. Figure 5 shows the action sequence generated by the agent for two 
different inputs with orientations 0° and 135°. The plots show the movement of 
the visual field of the agent in response to the given input. The results show 
that the action of the agent based on the learned reward table exactly reflects 
our analysis above: The agent, upon activation of a single orientation sensor, 
performs a movement mimicking the external input that triggered that sensor, 
thus assigning (in our interpretation) a meaning to the sensory neuron’s spike in 
terms of its own action. 
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Fig. 4. Reward Vector of Each Sensory State. The reward values of the four 
possible sensory states (0°, 45°, 90°, and 135°) are shown in polar coordinates. The 
top row from (a) to (d) are before training, and the bottom row from (e) to {h) are 
reward values after training. In each plot, for each point {0,5), the angle 6 represents 
the direction d £ D ot the visual field movement (there are 8 possible directions), 
and the distance 5 from the origin represents the associated reward value given the 
current sensory state (shown below each plot). The reward values were between 0 
and 1. Initially, the rewards are randomly assigned for each direction of motion for 
each sensory state. After the agent is trained, the reward values become maximal for 
the movement along the orientations that correspond to the input that triggered that 
sensory state. 



5 Discussion and Future Work 

The main contribution of our work is the realization that a sensorimotor agent 
can find the meaning of its sensory state within its own actions, but more im- 
portantly, that the objective of maintaining on-going sensory-invariance plays a 
key role in allowing the agent to autonomously discover this semantic link. 

An important message implicit in our work is that invariance can be seen 
from a very different perspective. Usually, invariance is seen as something that 
needs to be detected or picked up from the environment by the perceptual sys- 
tem (e.g., invariant feature detection in vision). However, our approach differs in 
that invariance is sought after in the internal activity pattern and it is internally 
enforced through a well-choreographed action. We speculate that there may be 
a link between this kind of action-based neural invariance and invariant sensory 
features in the conventional sense. For example, an approaching object will ex- 
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(a) 0< step < 30 






(b) 30< step < 60 



mm 



(c) 60< step < 90 



135° Input 



(d) 0< step < 30 (e) 30< step < 60 (f) 60< step < 90 

Fig. 5. Behavior of the Agent after Training. Each plot shows a snapshot of 30 
steps of movement of the agent’s visual field in the 51 x 51 scene (only every 6 steps 
are shown). The triangles indicate the location of the visual field in the scene and their 
grayscale values represent the simulation step (black is the most recent step). The light 
gray lines in the background show the oriented input edges. Two simulation runs are 
shown here: (a) to (c) are for 0° input and (d) to (/) are for 135°. The trained agent 
successfully generates motion sequence to trace the input in both runs based on its 
sensor state and policy tt. For example, in (b) the agent starts in the center and moves 
right, and bounces back when it reaches the end of the input (c). 




pand as time flows (turning on a certain neuron), and the same kind of effect 
can be achieved through a forward motion (again turning on the same neuron). 
Thus, the meaning of that neuron firing can be understood in terms of the ac- 
tion that would turn on that neuron reliably (cf. Gibson’s work on ecological 
perception and detection of environmental invariances [16]). Thus, even without 
action, when that neuron turns on (i.e., object is approaching), the brain can 
infer that it is analogous to moving forward toward the object (also see [14] on 
compensatory motion). 

Bell [38] posed an interesting question regarding the perception-action cycle. 
To quote, “What quantity should a perception-action cycle system maximize, 
as a feed- forward channel might maximize its capacity?”, which is relevant in 
our context. This is an important question, and we believe our reward criterion 
of maximizing on-going sensory invariance can serve as a potential answer. As 
we have seen, such a criterion can be used to internally learn the meaning of 
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sensory state which may be a very important function for a “perception-action 
cycle system” to possess. 

One criticism we anticipate is that if the agent had a rich array of sensors, 
such as a 2D matrix of RGB pixels, then the properties of the visual environ- 
ment can be easily recovered within the agent through unsupervised learning 
even without direct access to the outside world. However, this does not help 
solve the problem, because such a rich information is only available at the very 
first stage of sensory processing. The next stage, and the stages following that, 
etc. only receive a more and more encoded version from the previous stage, just 
like the sensory array in our agent which receives only encoded spikes from the 
orientation-tuned filters. Thus, the same difficulty can remain. Another related 
criticism may be that the learning criterion we proposed may not be applica- 
ble to all stages of sensory processing. For example, the retinal ganglion cells 
and the lateral geniculate nucleus (LGN) in the early visual pathway show a 
center-surround receptive field property. It is not easy to imagine any kind of 
action sequence that would possibly keep these neurons’ activities invariant. Our 
response to this is that sensorimotor coupling does not seem to exist at such an 
early stage of sensory processing (see e.g. [39]), and thus we do not claim that 
our approach will work in this stage. 

One potential limitation of our account is that our model implicitly assumes 
that the agent has direct knowledge about its own movement, upon which the 
meaning of the sensors are grounded. The work by Philipona et al. [14] and 
Pierce and Kuipers [21,22] point into a direction where a possible resolution can 
be found. They showed that without any knowledge of the external world, phys- 
ical properties of the environment can be learned through sensorimotor learning. 
Especially, Philipona et al. [14] observe that there are two classes of sensors, ex- 
teroceptive and proprioceptive. They observed that agents have complete control 
over proprioceptive sensors (i.e., they can exactly predict the values based on 
their actions), whereas the same is not true for exteroceptive sensors. Thus, 
action, and the closely tied proprioceptive sensors may provide a more direct 
knowledge (as we proposed in this paper) to the agent than other common sen- 
sors. Another point is that unlike perception which is highly underconstrained 
(the problem of inverse-optics), action or movement is strictly constrained by 
the bodily limits (e.g., we cannot stretch beyond a certain point). Such strong 
constraints may pose a learning problem to the brain which is significantly easier 
than perceptual learning. 

Gan our approach be extended into other sensory modalities such as audi- 
tion, somatic sense, olfaction, etc.? Our approach is general enough to be easily 
extended into certain modalities such as somatic sense (see e.g. [40]), but it 
cannot work very well in domains where there is not much correlation between 
action and the perceived sensory state, e.g., olfaction.^ Here, it would be useful 
to mention a different kind of meaning, those that are related to reinforcement 

^ There is however some evidence that the act of sniffing can alter the perceived sense 
of smell [41], which indicates that our approach may also be applicable, although in 
a limited way, in the olfactory domain. 
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signals such as gustatory rewards. Rolls proposed in [42] that semantics of brain 
states can be either related to (1) sensorimotor skills (as we also proposed) or to 
(2) environmental reinforcement. Olfaction, gustation, and other such sensory 
modalities convey signals that are highly related to survival values, and thus 
they may take on the latter kind of meaning. 

The model presented here is decidedly simple to convey the essence of the 
problem, and as such, it can be extended in several directions. We would like 
to note that sensory invariance does not always have to be defined on a single 
neuron’s activity. Any kind of pattern, be that spatial or temporal, can be at- 
tempted to be maintained invariant while performing an action. Thus, meaning 
based on action can also be ascribed to a repeating pattern of activity, not just 
to a single spike. Also, invariance can be maintained in only one part of the 
global pattern, which can be seen as a relaxed version of the invariance crite- 
rion. Attentional mechanisms [43] may be necessary for this purpose. We believe 
investigating in this direction will be most fruitful, and in fact we are currently 
steering our effort into this problem. 

Finally, we would like to re-emphasize that the problem of meaning we raised 
in this paper is not only a central issue in autonomous agent or neuroscience 
research (cf. [2]), but also in information technology (IT) in general. The current 
information technology is mostly syntax-driven, and there is not much provision 
for autonomous semantics: At the end of the day, the entities that assign meaning 
to the meaningless symbols are us, humans [44]. This is becoming a serious 
problem because of the rapid growth in the amount and rate of data, since 
we humans no longer have sufficient time to attach meaning to the continuous 
stream of data. The problem seems to be that current IT systems are passive 
processors of information. As we have seen in this paper, activeness and action is 
key to autonomous understanding [2], thus, exploring how and in what manner 
can we make IT systems to be active may allow us to create major breakthroughs 
for the future IT. 

6 Conclusion 

From the realization that neural decoding methods requiring direct knowledge 
of the stimulus pose a problem when viewed from within the brain, we derived 
a novel solution to the problem of learning the meaning of sensory states, i.e., 
through sensorimotor learning based on on-going sensory invariance. We believe 
that the insight developed in this paper can help build a more autonomous 
agent with a semantics grounded on its own sensorimotor capacity, for its own 
sake. Such agents with autonomous understanding will be necessary for a major 
breakthrough in the future of information technology. 
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Abstract. We describe the response properties of a compact, low power, analog 
circuit that implements a model of a leaky-integrate & Fire (I&F) neuron, with 
spike-frequency adaptation, refractory period and voltage threshold modulation 
properties. We investigate the statistics of the circuit’s output response by modu- 
lating its operating parameters, like refractory period and adaptation level and by 
changing the statistics of the input current. The results show a clear match with 
theoretical prediction and neurophysiological data in a given range of the param- 
eter space. This analysis defines the chip’s parameter working range and predicts 
its behavior in case of integration into large massively parallel very-large-scale- 
integration (VLSI) networks. 



1 Introduction 

Models of spiking neurons have complex dynamics that require intensive computational 
resources and long simulation times. This is especially true for conductance-based mod- 
els that describe in details the electrical dynamics of biological neurons [ 1 ] . These models 
include non-linear voltage-dependent membrane currents and are difficult to analyze 
analytically and to implement. For this reason, phenomenological spiking neuron mod- 
els are more popular for studies of large network dynamics. In these models the spikes 
are stereotyped events generated whenever the membrane voltage reaches a threshold. 
The Integrate-and-Fire (I&F) model neuron, despite its simplicity, captures many of 
the broad features shared by biological neurons. This model can be easily implemented 
using analog very-large- scale-integration (VLSI) technology and can be used to build 
low power, massively parallel, large recurrent networks, providing a promising tool for 
the study of neural network dynamics [2,3]. 

VLSI I&F neurons integrate presynaptic input currents and generate a voltage pulse 
when the integrated voltage reaches a threshold. A very simple circuit implementation 
of this model, the “Axon-Hillock” circuit, has been proposed by Mead [4]. In this circuit 
an integrating capacitor is connected to two inverters and a feedback capacitor. A pulse is 
generated when the integrated voltage crosses the switching threshold of the first inverter. 
An alternative circuit, proposed in [5], exhibits more realistic behaviors, as implements 
spike-frequency adaptation and has an externally set threshold voltage for the spike 
emission. Both circuits however have a large power consumption due to the fact that 
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the input to the first inverter (the integrated voltage on the capacitor) changes slowly, 
typically with time constants of the order of milliseconds, and the inverter spends a large 
amount of time in the region in which both transistors conduct a short-circuit current. 
The power consumption is reduced, but not optimized, in the circuit described in [6], 
using an amplifier at the input, to compare the voltage on the capacitor with a desired 
spiking threshold voltage. As the input exceeds the spiking threshold, the amplifier 
drives the inverter, making it switch very rapidly. In [ 7 ] Boahen demonstrates how 
it is possible to implement spike-frequency adaptation by connecting a four transistor 
“current-mirror integrator” in negative-feedback mode to any l&F circuit. An I&F circuit 
optimized with respect to power consumption but lacking of spike-frequency adaptation 
mechanisms, voltage threshold modulation, refractory period and explicit leak current is 
described in [8] . We designed a compact leaky I&F circuit, similar to previously proposed 
ones, that additionally is low power and has spike-frequency adaptation, refractory 
period and voltage threshold modulation properties [ 9 ]. In this work we characterize the 
circuit and compare its response properties to the ones predicted by theory and observed 
in neocortical pyramidal cells. Specifically we measured the response function of the 
circuit to noisy input signals, by varying both circuit parameters and the parameters that 
control the statistics of the input current. The results described in this paper present a 
description of the integrated-circuit’s data in neurophysiological terms, in order to reach 
a wider scientific community. With this approach we address important questions like the 
feasibility of simulation of large networks of spiking neurons built using analog VLSI 
circuits. 



2 The I&F Circuit 

The I&F neuron circuit is shown in Fig. I. The circuit comprises a source follower Mi- 
M2, used to control the spiking threshold voltage; an inverter with positive feedback 
M3-M7, for reducing the circuit’s power consumption; an inverter with controllable 
slew-rate M8-M1 1, for setting arbitrary refractory periods; a digital inverter M13-M14, 
for generating digital pulses; a current-mirror integrator M15-M19, for spike-frequency 
adaptation, and a minimum size transistor M20 for setting a leak current. 



2.1 Circuit Operation 

The input current is integrated linearly by Cmem onto Knem- The source-follower 
M1-M2, produces Vin = K,{Vmem ~ ^sf), whcre Vsf is a constant sub-threshold bias 
voltage and k is the sub-threshold slope coefficient [ 10 ]. As Vmem increases and Vin 
approaches the threshold voltage of the first inverter, the feedback current Ifb starts 
to flow, increasing Vmem and Vm more rapidly. The positive feedback has the effect 
of making the inverter M3-M5 switch very rapidly, reducing dramatically its power 
dissipation. 

A spike is emitted when Vmem is sufficiently high to make the first inverter switch, 
driving Vgpk and V02 to Vdd- During the spike emission period (for as long as Vgpk is 
high), a current with amplitude set by Vadap is sourced into the gate-to-source parasitic 
capacitance of M19 on node Vca- Thus, the voltage Vca increases with every spike, 
and slowly leaks to zero through leakage currents when there is no spiking activity. As 
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Vdd 




Fig. 1. Circuit diagram of the I&F neuron. 



Vca increases, a negative adaptation current ladap exponentially proportional to Vca is 
subtracted from the input, and the spiking frequency of the neuron is reduced over time. 

Simultaneously, during the spike emission period, V 02 is high, the reset transistor 
Mi 2 is fully open, and Cmem is discharged, bringing Vmem rapidly to Gnd. As Vmem 
(and Vin) go to ground, Voi goes hack to Vdd turning Mio fully on. The voltage V 02 is 
then discharged through the path Mio-Mi i, at a rate set hy Vrfr (and by the parasitic 
capacitance on node Vo 2 )- As long as V 02 is sufficiently high, Vmem is clamped to ground. 
During this “refractory” period, the neuron cannot spike, as all the input current I^j is 
absorbed by M 12 . 

The adaptation mechanism implemented by the circuit is inspired by models of its 
neurophysiological counterpart [11,12,13]: the voltage Vca, functionally equivalent to 
the calcium concentration [Ca^+] in a real neuron, is increased with every spike and 
decays exponentially to its resting value; if the dynamics of Vca is slow compared to 
the inter-spike intervals then the effective adaptation current is directly proportional to 
the spiking rate computed in some temporal window. This results had been extensively 
applied to investigate the steady-state responses [14,15] and the dynamic proprieties [15] 
of adapted neurons. 

Figure 2(a) shows an action potential generated by injecting a constant current linj 
into the circuit and activating both spike-frequency adaptation and refractory period 
mechanisms. Figure 2(b) shows how different refractory period settings (Vrfr) saturate 
the maximum firing rate of the circuit at different levels. 
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(a) (b) 



Fig. 2. (a) Measured data (circles) representing an action potential generated for a constant input 
current with spike-frequency adaptation and refractory period mechanisms activated. The 
data is fitted with the analytical model of eq. (5) (solid line), (b) Circuit’s /-/ curves (firing rate 
versus input current Unj) for different refractory period settings. 



2.2 Modeling the Neuron’s Subthreshold Behavior 

The circuit presented does not implement a simple linear model of an l&F. Rather its pos- 
itive feedback and spike-frequency adaptation mechanisms represent additional features 
that increase the model’s complexity (and hopefully its computational capabilities). The 
overall current that the circuit receives is + //{, — ladap, where is the circuit’s 
input current linj subtracted by the leak current Iieak (see Section 2.3), / fb is the positive 
feedback current and ladap is the adaptation current generated by the spike-frequency 
adaptation mechanism. We can use the transistor’s weak-inversion equations [10] to 
compute the adaptation current: 



ladap = loe'" ( 1 ) 

where Iq is the transistor’s dark current [ 10 ] and Ut is the thermal voltage. 

If we denote with Ca the parasitic gate-to-source capacitance on node Vca of Mi 9 , 
and with Cp the parasitic gate-to-drain capacitance on Mi 9 , then: 

Vca = Vcao + iVmem ( 2 ) 

where 7 = ^ and Vcag is the steady-state voltage stored on Ca, updated with each 
spike. 

To model the effect of the positive feedback we can assume, to first order approxi- 
mation, that the current mirrored by M 3 ,Mt is: 

KVi„ 



Ifb = he‘ 



(3) 
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where I\ is a constant current flowing in the first inverter when both M 4 ,M 5 conduct, 
and Vin = K{Vmem ~ ^s/) IS the output of the source-follower Mi, M 2 . 

The equation modeling the subthreshold behavior of the neuron is: 



^0 7, ^mem — ^in 

at 



^ fb ^adap 



(4) 



where Cq = Cm + 7 C'a- Substituting ladap and If}, with the equations derived above 
we obtain: 



Cq , Vmem — 
at 



he 



— k 2 -i 



JT 0 



k2- 



he 



. Van 



«7 - 



Vmem \ 

1 — e j 



(5) 



We fitted the experimental data by integrating eq. (5) numerically and using the 
parameters shown in Table 1 (see solid line of Fig. 2(a)). The initial part of the fit (for 
low values of Vmem) is not ideal because the equations used to model the source follower 
Mi, M 2 are correct only for values of Vmem sufficiently high. 



Table 1. Parameters used to fit the data of Fig. 2(a) 



Cm = 0.66pF 


Ii„ = 177pA 


Vsf = 0.5V 


Ca = 0.12pF 


h = 2.29pA 


14o = 50mV 


Cp = SOOfF 


o 

o 

II 


d 

II 



2.3 Stimulating the Neuron Circuit 

To inject current into the neuron circuit we use an on-chip p-type transistor operating 
in the weak-inversion domain [10]. By changing the transistor’s gate voltage we can 
generate the current: 



hnj = (6) 

where Vp is the p-type transistor’s gate voltage that we can control. If we take into 
account the leak current Iieak sourced by the transistor M 20 of Fig. 1 we can write the 
net input current to the circuit as: 

hn = hnj - heak = / 0 p 6 ^ ) (7) 



On the other hand, the desired input current that we want inject into the neuron is: 

Ides = Ido ■ er) ( 8 ) 
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Fig. 3. f-I curves for different reference voltages l^o having set the refractory period to zero. The 
relation between the input current and the output frequency is linear. 



where Ido is a normalizing factor and 77 represents a stochastic input signal ranging from 
zero to one, characterized hy a mean value /i and standard deviation (STD) a. 

We can force the net input current to be the desired input current Ides if we break 
up the current source gate voltage Vp in the following way: 



Vp — I'lVin + T' 2 VpO ( 9 ) 

where Vpo is a constant reference voltage, Vm is the voltage encoding the signal 77 
(controlled by a PC-IO card), and r\ and V2 are the factors of a resistive divider used to 
scale and sum the two voltages V^o and Vm- In this case the net input current becomes: 

hn = - heak ( 10 ) 

which can be simplified to 

= -heak ( 11 ) 

with constant 

Ip = (12) 

If we map the signal 77 onto Vm in a way that 

Vin = - 77 ( 13 ) 
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Fig. 4. f-I curves measured for three different values of a. Note that for <t = 0 (cross marker) the 
presence of high non-linearity at the rheobase. For increasing cr’s the behavior at the rheobase is 
linearized. 



we make use of the full PC-IO card dynamic range (from -5V to +5V) and obtain the 
desired current I in = Ides, provided that the leak current is set to: 

heak=Ipe-^''^'^^^ (14) 



and that Ido of eq.(8) is: 



= (15) 

In Fig. 3 we show the effect of V^o (that affects exponentially Ido) on the f-I curve 
measured from the circuit, for increasing values of the mean input current with ct = 0. 



3 Results 

We first tested the neuron with the adaptation mechanism turned off, injecting an input 
current with a statistics parameterized by mean fx and STD cr. 



3.1 General Proprieties of the I&F Circuit 

We measured the I&F circuit’s f-I curves as a function of its input current IdoV- The 
signal a) was generated to reproduce white noise with mean /r and STD tr. Figure 4 
shows the f-I curves for three different values of STD. All the curves were obtained by 
setting the neuron’s refractory period to approximately 6.6 ms (Vrfr =280 mV). 



196 D. Ben Dayan Rubin, E. Chicca, and G. Indiveri 



The circuit’s firing rate / has a dependence on the the refractory period (tj.) of the 
type [16]: 



/ 




(16) 



Figure 2(b) shows f-I curves obtained for three different values of Vr fr (Tr)- The curves 
tend, in the limit of 0, to a straight line with slope inversely proportional to the 
circuit’s spiking threshold voltage (as shown in Fig. 3). 

We measured the distribution of the inter-spike intervals (ISIs) generated by the 
circuit for two values of <t = {0.05, 0.1}, sweeping the mean input current IdoV- To 
analyze the statistic of these distributions, we computed their coefficient of variation 
(CV), given by the ratio between the STD and the mean of the neuron’s ISl [17,18]. In 
Fig. 5 we plot the CVs against the neuron’s output frequency. The CVs are in accordance 
with theoretical [19] and experimental studies on neurons of layer 4 and 5 of the rat [14]. 
The ISI distribution for increasing input currents shifts toward lower mean-ISI, and 
its STD decreases. The refractory period constrains the distribution to remain above a 
certain ISI even if its STD decreases with the current. In the theoretical limit of a renewal 
process the mean and the STD of the ISI distribution should be approximately equal. By 
increasing the mean afferent current the CV decreases because the probability to remain 
above the threshold for spiking increases reducing the stochasticity of the spiking event. 



> 

O 



1 

0.8 

0.6 

0.4 

0.2 

0 



o a = .05 
- » - a = 1 



k 

\ 

Q ' 



O, »• 



'O., 



' 'O ' -n 

o , 



•Ooo 



0 20 40 60 80 100 

Frequency (Hz) 



Fig. 5. Coefficients of variation of the I&F neuron’s ISIs for two different values of a plotted against 
output frequency. Higher cr’s produce higher spike decorrelation, similar to what is observed in 
Poisson processes (CV close to one). 
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3.2 Effects of the Adaptation on the I&F Circuit 

Here we consider how the spike frequency adaptation mechanism influences the I&F 
neuron’s behavior. We analyzed the response of the circuit to a series of depolarizing 
current steps with increasing values of /i (with a=0) and with different values of the 
spike-frequency adaptation rate, both in the transient regime and in the steady-state 
regime. 



Dynamic Firing Proprieties. The neuron responds to current steps with instantaneous 
firing rates that progressively adapt to lower (steady-state) values (see Fig. 6 ). The 
circuit’s adaptation current ladap is integrated by a non-linear integrator (see M 15 -M 19 
of Fig. 1) and increases progressively with every spike (see also Section 2.2). As ladap 
is subtracted from the input current the neuron’s net input current progressively 
decreases, together with its output firing rate. In the steady-state an equilibrium is reached 
when the adaptation current is balanced with the output firing rate (significantly lower 
that the initial one). 




Fig. 6. Instantaneous frequency response of the circuit with Vadap=4. 19V, for increasing values 
of input step amplitudes (decreasing values of Vp). The abscissa of each data point corresponds 
to the spike time from the input step onset. 



In Fig. 6 we show different instantaneous frequency response curves over time (/- 
t curves) for increasing values of the input current’s step amplitude and for a fixed 
adaptation setting. Similar to what has been observed experimentally [20] , the adaptation 
rate increases and the instantaneous frequency response decay time decreases, with 
higher input step amplitudes. 
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Fig. 7. Instantaneous frequency response for different adaptation rates. The neuron adapts more 
quickly as the adaptation rate increases {Vadap decreases), and the corresponding steady-state 
output frequency is lower. 



In Fig. 7 we plotted different f-t curves for different values of the adaptation rate. 
The data plotted shows how increasing levels of adaptation shorten the time required by 
the neuron to adapt and to reach a mean steady-state value. 



Steady-State Firing Proprieties. Figure 8 shows two steady-state /-/ curves measured 
for two different spike-frequency adaptation rates. Increasing values of adaptation rate 
decrease the overall steady-state firing rate /, as shown also in Fig. 7. The inset of Fig. 8 
evidences how spike-frequency adaptation has the effect of decreasing the slope of the 
steady-state curves at the rheobase, as predicted by theoretical [11] and experimental [14, 
15] evidence. 

4 Conclusions 

We presented a novel analog VLSI circuit that implements a real-time model of a leaky 
l&F neuron. We characterized its response properties in a wide range of conditions, as 
a function of both the circuit’s parameters and the statistics of the input signals. One 
of the most interesting properties of the circuit is its ability to model spike-frequency 
adaptation. We activated this feature, characterized the circuit, and showed how it exhibits 
different adapting behaviors when its operating conditions change. The inclusion of the 
adaptation mechanism addresses the question of which neurophysiological parameters 
in real neurons (spike induced Ca^+ influx, [Ca^+] decay time, ionic conductances) are 
actually captured by the VLSI circuit. Ahmed et al. [20] reported that spike frequency 
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Fig. 8. f-I curves of the steady-state response of the adapted neuron for two different values of 
spike-frequency adaptation rate. The figure inset shows a detail of the f-I curve at low levels of 
current injection, confirming the adaptation induced linearization at the rheobase. 



adaptation to a current step in neurons of the cat primary cortex can be well fitted by a 
single exponential curve depending on the degree of adaptation. This behavior is well 
captured by our circuit (see Fig. 6): the exponential rate decay is observed for low 
values of input currents, and the degree of adaptation can be set with Vadap- The results 
presented here, together with the circuit’s low-power characteristics [9] make it suitable 
for integration in very large arrays containing also synaptic circuits [2,7,21], and for the 
construction of massively parallel analog VLSI networks of spiking neurons. 
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Abstract. After defining a Universe for computer science in opposition 
to the Universe of biology, this paper presents the roles that cellular 
division plays in both of them. Based on the nine construction rules of 
the so-called Tom Thumb algorithm, cellular division leads to a novel self- 
replicating loop endowed with universal construction and computation. 
The self-replication of the totipotent cell of the “LSL” acronym serves 
as an artificial cell division example of the loop and results in the growth 
and differentiation of a multicellular organism. 



1 Introduction 

The Embryonics project (for embryonic electronics) aims at creating radically 
new computing machines inspired by Nature and able to grow, to self-repair, 
and to self-replicate. 

The embryonic development of living beings is extremely complex. If nu- 
merous partial results have already been reported [14], there exist yet major 
controversies about the basic mechanisms which trigger the development of an 
organism and, more precisely, about the internal increase of complexity of a 
growing being [1]. 

Embryonic development can be roughly described as the construction of 
a three-dimensional carbonic organism from a one-dimensional blueprint, the 
genome, assuming that a number of external conditions are satisfied (food, tem- 
perature, etc.). Of course, the developmental process of complex organisms in- 
volves processes that are not completely specified within the genome, but are 
rather heavily influenced by the environment (homozygote twins grow up to 
become different individuals). However, for simpler organisms this observation 
is not necessarily as true: for example, the nematode worm Caenorhabditis ele- 
gans, a real star in the studies of the molecular biology of development, invariably 
develops into an adult hermaphrodite of exactly 945 cells, as long as environ- 
mental conditions are acceptable. Developmental processes based exclusively on 
the information stored in the genome are therefore possible, and it is then rea- 
sonable to begin exploring a developmental approach in an entirely new milieu 
by limiting the scope of research to the relatively simpler mechanisms involved 
in genome-based development. 
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These mechanisms fascinate engineers who dream of developing computing 
machines mimicking living organisms in a completely different environment, the 
two-dimensional world of silicon. Our Embryonics project aims at creating such 
machines which, starting from a one-dimensional blueprint, an artificial genome, 
will be able to grow and give birth to computers endowed, as their living models, 
with original properties such as self-repair and self-replication. These embryonic 
machines would be best suited for harsh environments: space exploration, atomic 
plants, avionics, etc. 

We shall briefly visit the Universe of biology and recall some fundamental 
mechanisms, notably cellular division and cellular differentiation which consti- 
tute our main source of inspiration. We will then come back to the Universe of 
computer science to embed these mechanisms into silicon, and to show how it is 
possible to design actual computing machines able to self-replicate. 

In Sect. 2, after a short reminder of cellular division in living beings, we will 
show that a new algorithm, the Tom Thumb algorithm, will make it possible to 
design a self-replicating loop that can easily implement artificial cellular division 
in silicon. This algorithm will be illustrated by means of a minimal unicellular 
organism, the Annulus elegans, composed of four molecules. This mother cell will 
grow and then divide, triggering the growth of two daughter cells. This example is 
sufficient for deriving the nine rules which constitute the Tom Thumb algorithm. 
Sect. 3 deals with the generalization of the methodology previously described 
and its application to the growth and cellular differentiation of a multicellular 
organism, the Acronymus elegans, which implements the “LSL” acronym (for 
Logic Systems Laboratory). Sect. 4 will conclude by a brief discussion about the 
internal increase of complexity, as well as by a first and rudimentary calculation 
of the complexity of Annulus elegans and Acronymus elegans. 

2 The Tom Thumb Algorithm for Artificial Cellular 
Division 

2.1 Cell Division in the Universe of Biology 

Before describing our new algorithm for the division of an artificial cell, let us re- 
member the roles that cellular division plays in the existence of living organisms 

[2](p. 206). 

“When a unicellular organism divides to form duplicate offspring, the division 
of a cell reproduces an entire organism. But cell division also enables multicellular 
organisms, including humans, to grow and develop from a single cell, the fertilized 
egg. Even after the organism is fully grown, cell division continues to function 
in renewal and repair, replacing cells that die from normal wear and tear or 
accidents. For example, dividing cells in your bone marrow continuously supply 
new blood cells. The reproduction of an ensemble as complex as a cell cannot 
occur by mere pinching in half; the cell is not like a soap bubble that simply 
enlarges and splits in two. Cell division involves the distribution of identical 
genetic material (DNA) to two daughter cells. What is most remarkable about 
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cell division is the fidelity with which the DNA is passed along, without dilution, 
from one generation of cells to the next. A dividing cell duplicates its DNA, 
allocates the two copies to opposite ends of the cell, and only then splits into 
two daughter cells” . 

In conclusion, we can summarize the two key roles of cell division. 

— The construction of two daughter cells in order to grow a new organism or 
to repair an already existing one (genome translation) . 

— The distribution of an identical set of chromosomes in order to create a copy 
of the genome from the mother cell aimed at programming the daughter cells 
( genome trans crip tion). 

Switching to the Universe of computer science, we will propose a new algo- 
rithm, the Tom Thumb algorithm, which, starting with a minimal cell made up 
of four artificial molecules, the Annulus elegans, constructs both the daughter 
cells and the associated genomes. A tissue of such molecules will in the end be 
able to constitute a multicellular organism endowed with cellular differentiation. 

2.2 Definition of the Universe of Computer Science 

The implementation of developmental mechanisms in the world of silicon repre- 
sents a complex challenge. Cellular division, notably, is a process that is inher- 
ently physical, in that it involves, if not the creation, at least the manipulation 
of matter. Unfortunately, current technology does not allow such manipulation 
where electronic circuits are concerned, and developmental approaches in the 
world of computer science must then be applied to information rather than 
matter. 

Fortunately, the technology of field-programmable gate arrays (FPGAs) al- 
lows a simple transition between information (the bitstream that configures the 
circuit) and matter (the actual circuit implemented on the FPGA). This transi- 
tion is the key background for the approach described in this article, where the 
basic unit, our artificial cell, is represented as a string of hexadecimal characters 
which in fact represents the configuration required to implement the actual cell 
(i.e., the logic gates that implement the cell’s functionality) in a custom FPGA 
of our own design. 

In general, our approach is based on a hierarchical system of growing com- 
plexity, with organisms (computing machines dedicated to a user-defined task) 
being made up of cells (small processing elements), themselves assembled start- 
ing with simpler components, the molecules (the elements of our custom FPGA). 
Our developmental mechanisms shall then operate by allowing the set of molecu- 
lar configurations that implement a cell to replicate itself, implementing a process 
not unlike the cellular division that underlies the growth of biological organisms. 
A different set of mechanisms will then be used to assign a specific function to 
each of the cells (cellular differentiation). 

A practical way to verify the realization of such novel mechanisms in the 
world of computer science is to approach the problem through the creation of 
an artificial Universe, defined by a container, a content, and a set of rules. 
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The container is a two-dimensional flat space, divided in rows and columns 
(Fig. la). Each intersection of a row and a column deflnes a rectangle or molecule, 
which divides in three memory positions: left, central, and right. Time flows in 
discrete clock times, the time steps, identified by integers {t = —1, 0, 1, 2, ...). 

The content of this Universe is constituted by a finite number of symbols, each 
represented by a hexadecimal character ranging from 0 to E, that is, from 0000 
to 1110 in binary (Fig. lb). These symbols are either empty data (0), molcode 
data (for molecule code data, M = 1 to 7) or flag data, each indicating one 
of the four cardinal directions: north, east, south, west (F = 8 to E). Molcode 
data will be used for configuring our final artificial organism, while flag data 
are indispensable for constructing the skeleton of the cell. Furthermore, each 
character is given a status and will eventually be mobile data (white character), 
indefinitely moving around the cell, or fixed data (grey character), definitely 
trapped in a memory position of the cell (Fig. Ic). The original genome for the 
minimal cell is organized as a string of six hexadecimal characters, i.e. half the 
number of characters in the cell, moving counterclockwise by one character at 
each time step {t = 0, 1, 2, ...) (Fig. la). 

The set of rules deflnes the behavior of the content of the Universe. It is 
defined by a set of 9 rules, used to construct the cells and to implement cellular 
division (growth). 

A Universe is, of course, defined as a function of the kind of cells required 
to execute a given application. To provide an overview of the approach, in the 
present article the Universe is defined so as to implement the cells of the very 
simple organism Annulus elegans. It should be noted, however, that our approach 
is perfectly scalable: should an application need more complex organisms, it is 
possible to extend both the size of the molecules’ memory and the number of 
molecules in each cell without altering the basic algorithm in the least. 

2.3 Constructing the Cell 

The three first rules (rules 1 to 3) allow for the construction of a first path 
closing on itself: a loop (Fig. 3) which will constitute the mother cell. At each 
time step, a character of the original genome, always beginning by a flag F, 
is shifted from right to left and stored in the lower leftmost molecule (Fig. la 
and 3). The construction of the cell, i.e. storing the fixed data and defining the 
paths for mobile data, depends on three patterns (Fig. 2). 

— If the three memory positions of a molecule are empty (blank squares), the 
flag is shifted by one position to the right. Similarly, if the two rightmost 
memory positions of a molecule are empty, the flag is shifted by one position 
to the right (shift data: rule 1). 

— If the rightmost memory position is empty and the two leftmost memory 
positions hold flags (F), the characters are shifted by one position to the right 
(load flag: rule 2). In this situation, the rightmost F' character is trapped 
in the molecule (fixed data), and a new connection is established from the 
central position toward the northern, eastern, southern or western molecule, 
depending on the fixed flag information (F' = 8 or 9, A or E, B, C or D). 
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0 : north branch and east connection flag (E) 
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P~| : mobile data 


rn : fixed data 



(c) 



Fig. 1. Our artificial Universe, (a) The container with the genome of a minimal cell, 
(b) Graphical and hexadecimal representations of the symbols, (c) Graphical represen- 
tation of the status of each symbol. 



— If the rightmost memory position is empty, while the central and leftmost 
memory positions hold a flag (F) and a molcode (M) respectively, then 
the characters are shifted by one position to the right (load molcode and 
flag: rule 3). In this case, both characters are trapped in the molecule (flxed 
data), and a new connection is launched from the leftmost position toward 
the northern, eastern, southern or western molecule, depending on the flxed 
flag information (F = 8 or 9, A or E, B, C or D). 



At time t = 12, twelve characters, i.e. twice the contents of the original 
genome, have been stored in the twelve memory positions of the cell (Fig. 3). 
Six characters are flxed data, forming the phenotype of the final cell, and the six 
remaining ones are mobile data, composing a copy of the original genome, the 
genotype. Both translation (i.e. construction of the cell) and transcription (i.e. 
copy of the genetic information) have been therefore achieved. 
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Rule 3: load molcode and flag -> |M|F| | - H - |M| F | 

(general case) ( j 



Fig. 2. The three rules for constructing a cell. 



The fixed data trapped in the rightmost memory position(s) of each molecule 
remind us of the pebbles left by Tom Thumb for memorizing his way. The min- 
imal artificial organism will be henceforth designated as Annulus elegans. 

In order to grow an artificial organism in both horizontal and vertical direc- 
tions, the mother cell should be able to trigger the construction of two daughter 
cells, northward and eastward. Two new rules (rules 4 and 5) are thus necessary. 

At time t = 8 (Fig. 3 and 4a), we observe a pattern of characters which 
is able to start the construction of the northward daughter cell (rule 4). The 
upper leftmost molecule is characterized by two specific signals, i.e. a fixed flag 
indicating a north branch (F = E) and a branch activation flag {F = 8) ready 
to enter the leftmost memory position. 

At time t = 17 (Fig. 3 and 4b), another particular pattern of characters will 
start the construction of the eastward daughter cell (rule 5). The lower rightmost 
molecule is characterized by two specific signals, i.e. a fixed flag indicating an 
east branch {F = D), and the branch activation flag {F = 8) in the leftmost 
memory position. 



2.4 Growing a Multicellular Organism 

In order to analyze the growth of a multicellular artificial organism, we are led 
to carefully observe the interactions of the different paths created inside and 
outside each individual cell. Numerous hazards are threatening the development 
of the different cells, and four new rules (rules 6 to 9) are necessary for avoiding 
collisions at the crossroads. 










3 
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Fig. 4. The two rules triggering the paths to the north and east molecules. 



Mother loop Daughter loop 




t = 29 



Rule 5 Rule 2 with F'=D 
(eastward (westward 

connection) connection) 



Mother loop Daughter loop 




t = 30 



Rule 6: 

priority to rule 2 (westward direction) 
over rule 5 (eastward direction) 



Fig. 5. Rule 6, defining the priority of the east-west direction over the west-east one. 



Rule 6, detailed in Fig. 5, arbitrates the conflict between an eastward branch 
launched by the mother cell (rule 5) and the simultaneous construction of a 
westward path starting from a daughter cell, at the right of the mother cell 
(rule 2 with F' = 2). In such a case, we will choose the priority of the east- west 
direction over the west-east direction. This conflict may be represented by the 
simplified schema of Fig. 6a where only the daughter loop is represented. Such 
a schema can also be used for representing rule 7 (Fig. 6b: priority east-west 
over south-north), rule 8 (Fig. 6c: priority south-north over west-east), and rule 
9 (Fig. 6d: priority east-west over south-north and west-east). 

The diverse priorities defined by rules 6 to 9 may be described by the following 
relation: 



(east-west) > (south-north) > (west-east) 

which expresses the following choice: a closing loop has priority over all other 
outer paths, which makes the completed loop entirely independent of its neigh- 
bors (rules 6, 7 and 9), and the organism will grow by developing bottom-up 
vertical branches (rule 8). This choice is quite arbitrary and may be changed 
according to other specifications. 

It is now possible to come back to the detailed representation of a multicel- 
lular organism made up of 2 x 2 minimal cells (Fig. 7) and exhibit it at different 
time steps in accordance with the above mentioned priorities. 
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t = 29 t = 30 t = 35 

(a) 



t = 36 





t = 20 t = 21 t = 26 

(b) 



t = 27 



26 




t = 38 





t = 33 



t = 45 



Fig. 6. The four priority rules, (a) Rule 6: east-west over west-east, (b) Rule 7: east- 
west over south-north, (c) Rule 8: south-north over east-west, (d) Rule 9: east-west 
over south-north and west-east. 



3 Toward Differentiated Multicellular Organisms 

3.1 Cell Differentiation in the Universe of Biology 

The nematode Caenorhahditis elegans is a small worm that has recently come 
to occupy a large place in molecular biology of development. Thanks to its very 
particular characteristics (notably, the worm is transparent), it has been possible 
to reconstruct the entire anatomical history of each cell as the fertilized egg 
develops into a multicellular adult [13]. 

Two amazing conclusions emerged when the information gathered from de- 
tailed anatomical studies of living worms was combined with more classical 
anatomical studies obtained by electron microscopy of serial thin sections of 
the worm at different developmental stages. First, each cell in the adult worm 
is derived from the zygote, the first mother cell of the organism, by a virtually 
invariant series of cell divisions called a cell lineage. Second, as a direct con- 
sequence of the invariant cell lineages, individual nematodes are anatomically 
invariant carbon copies of each other. The mature adult hermaphrodite always 
consists of exactly 945 cells. 
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Fig. 7. Constructing a multicellular organism made up of 2 x 2 miuimal cells. 



Cell differentiation can proceed in at least two very different styles — mosaic 
and regulatory — which presumably reflect profoundly different molecular mech- 
anisms. In mosaic development, or temporal development (so called because the 
organism is assembled from independent parts), the differentiation of a cell does 
not depend on the behavior or even the existence of neighboring cells: appar- 
ently, internal events within each cell determine its actions; such events could be 
triggered by cell division itself or by the ticking of an internal biological clock 
that is set in motion by fertilization. In regulatory development, or spatial devel- 
opment, differentiation is partially or completely dependent on the interaction 
between neighboring cells. 

Mosaic development governed by strict cell lineages is the overwhelming rule 
in C. elegans, and regulative development the exception. In other multicellular 
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organisms {Homo sapiens sapiens included), the reverse is almost certainly true. 
The price of such mosaicism may be very small size (perhaps only a limited num- 
ber of cell divisions can be so rigidly programmed) and only modest complexity 
(cell-cell interactions may be required to construct a more elaborate anatomy) . 

As we are dealing with the construction of rather complex computing ma- 
chines, we are led to choose the model of regulatory development, or spatial 
development, in the framework of the Embryonics project. 

3.2 Cell Differentiation in the Universe of Computer Science 

Even if the final goal of our project is the development of complex machines, 
in order to illustrate the basic mechanisms of Embryonics we shall use an ex- 
tremely simplified example, the display of the acronym “LSL”, for Logic Systems 
Laboratory. 

The machine that displays the acronym can be considered as a one- 
dimensional artificial organism, Acronymus elegans, composed of three cells 
(Fig. 8a). Each cell is identified by a A coordinate, ranging from 1 to 3 in 
decimal or from 01 to 11 in binary. For coordinate values X = 1 and A = 3, the 
cell should implement the L character, while for A = 2, it should implement the 
S character. A totipotent cell (in this example, a cell capable of displaying either 
the S or the L character) comprises 6 x 7 = 42 molecules (Fig. 8b), 36 of which 
are invariant, five display the S character, and one displays the L character. An 
incrementer — an adder of one modulo 3 — is embedded in the final organism; this 
increme nter implements the truth table of Fig. 8c and is represented by the logic 
diagram and symbol of Fig. 8d. According to the table, the value of the binary 
variable AO is sufficient to distinguish the display of character L (AO = 1) from 
the display of character S (AO = 0 or AO' = 1). 

These specifications are sufficient for designing the final architecture of the 
totipotent cell (Fig. 9). According to the Little Thumb algorithm, half of the 
molecules of the cell, i.e. 21, are genotypic (G) and have no functionality: they 
are used for storing a copy of the original genome. The others are phenotypic 
molecules, divided into six categories (from 1 to 6) depending on their function- 
ality: 

1. Two busses for the horizontal transfer of the A coordinate. 

2. Modulo 3 incrementation. 

3. One bus for the vertical distribution of the AO logic variable. 

4. Permanent display of characters L and S. 

5. Display of S character only (AO = 0 or AO' = 1). 

6. Display of L character only (AO = 1). 

The final totipotent cell is therefore made up of 6 x 7 = 42 molecules con- 
nected according to the pattern in Fig. 10a: bottom-up in the odd columns, 
top-down in the even columns, with the lower row reserved for closing the loop. 
It is then possible to define all the flags in the rightmost memory position of each 
molecule (grey characters in Fig. 10a) without forgetting the branch activation 
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Fig. 8. Specifications of the Acronymus elegans. (a) Three cell artificial organism, (b) 
The totipotent cell, (c) Truth table of the coordinate incrementer. (d) Logic diagram 
and symbol of the incrementer. 



and north connection flag in the lower molecule of the first column {F = 8), the 
north branch and east connection flag in the upper molecule of the first column 
(F = E), and the east branch and west connection flag in the lower molecule of 
the last column {F = D). 

According to our algorithm, the 21 phenotypic molecules (Fig. 9) occupy 
a fixed, pre-determined position in the totipotent cell (Fig. 10a). The other 
21 genotypic molecules are used for storing and circulating the final genome 
whose detailed information, i.e. 21 x 3 = 63 hexadecimal characters (Fig. 10b), 
is derived by reading clockwise the fixed characters (grey characters in Fig. 10a) 
of the whole loop, starting with the lower molecule of the first column. Finally, 
we just assume that each genotypic molecule will not affect the display in order 
to respect the original specifications. 

Last, it was possible to embed the basic molecule in each of the 2000 field- 
programmable gate arrays of the BioWall [11] and to show the growth of a 
first multicellular artificial organism, Acronymus elegans, followed by its self- 
replication in both vertical and horizontal dimensions (Fig. 10c). We therefore 
obtain a population of identical organisms, i.e. clones, thus creating a fourth 
level in our hierarchy, a population of organisms. 
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Fig. 9. The genotypic and phenotypic molecules of the totipotent cell. 



4 Conclusion 

4.1 Present and Future Applications 

Several years before the publication of the historical paper by Crick and Watson 
[12] revealing the existence and the detailed architecture of the DNA double 
helix, von Neumann was already able to point out that a self-replicating ma- 
chine required the existence of a one-dimensional description, the genome, and 
a universal constructor able to both interpret (translation process) and copy 
(transcription process) the genome in order to produce a valid daughter organ- 
ism. Self-replication allows not only to divide a mother cell (artificial or living) 
into two daughter cells, but also to grow and repair a complete organism. Self- 
replication is now considered as a central mechanism indispensable for circuits 
that will be implemented through the nascent field of nanotechnologies [4] [10], 
particularly when the fault-tolerant properties associated with our developmen- 
tal approaches are taken into consideration [9]. 

A first field of application of our new self-replicating loop is quite natu- 
rally classical self-replicating automata, such as three-dimensional reversible au- 
tomata [5] or asynchronous cellular automata [8]. 

A second, and possibly more important field of application is Embryonics, 
where artificial multicellular organisms are based on the growth of a cluster of 
cells, themselves produced by cellular division [6] [7]. It is within this context 
that cellular differentiation will become a key aspect of our growth mechanism, 
as each newly-created cell identifies its position and its designated role within 
the complete organism. 

Finally, other possible open avenues concern the evolution of such loops 
and/or their capability to carry out massive parallel computation [3]. 
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Fig. 10. Realization of the Acronymus elegans. (a) The 6 x 7 = 42 molecules of 
the totipotent cell, (b) Genome, (c) BioWall implementation (Photograph by A. 
Badertscher) . 



4.2 Emergence and Complexity 

If we assume the existence of a silicon substrate organized as a homogeneous 
matrix of basic elements or “molecules” , we observe that the injection of a finite 
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string of discrete symbols, the genome, successively produces the emergence of 
cells (by cellular division), of multicellular organisms (by cellular differentiation) 
and, finally, of a population of identical organisms, i.e. clones (by cyclic repetition 
of the coordinates). This emergence is not magic, and follows from a deterministic 
use of logic symbols considered as the configuration of the molecules, themselves 
implemented by field-programmable gate arrays (FPGAs). Emergence is then 
the trivial result of a chain of several mechanisms (cellular division and differ- 
entiation, repetition of coordinates). This appearance of a complex system (a 
population of Acronymus elegans in our example) from rather simple molecules 
and a short genome, simply creates the illusion of an internal growth of com- 
plexity. 

4.3 Measurement of Complexity 

The measurement of the complexity of embryonic machines, in the Universe of 
computer science, is rather delicate. We will try to propose a first and some- 
what rough approximation. Referring to the definition of the complexity due to 
Kolmogorov, we measure the complexity K(G) of the genome of our artificial 
organism as the length of the smallest program able to generate this genome, to 
which we add the complexity K (C) necessitated by the configuration transform- 
ing an element of the FPGA (in our case, a Xilinx Spartan XGSIOXL circuit) 
into a molecule. 

This mode of calculation accounts for the complexity of the software (the 
genome) and the hardware (the FPGA), and allows for comparing the complexity 
of concurrent realizations on the same silicon substrate. In the case of Annulus 
elegans (Fig. la), we have K{G) = 6 x 4 = 24 bits, with K{C) = 24’896 bits, 
i.e. a total of K{G) + K{G) = 24’920 bits, while in the case of Acronymus 
elegans (Fig. 10b), we have K{G) = 252 bits with K{G) = 26’808 bits, i.e. a 
total of K{G) + K{C) = 27’060 bits. We observe, in the last case, that the 
configuration of the molecule is a hundred times more complex than that of the 
genome necessitated by the construction of the “LSL” acronym. This enormous 
difference can also be found in biology, where most of the genetic material is 
aimed at producing the ribosome, which is roughly equivalent to our artificial 
molecule able, in the Embryonics project, to decode and interpret the artificial 
genome. 

Finally, we hope that our reader will be convinced that computer science 
may achieve its own growth by renewing its inspiration from the observation of 
the marvelous machines which populate the living world. 
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Abstract. This article describes a novel approach to the implementa- 
tion on an electronic substrate of a process analogous to the cellular 
division of biological organisms. Cellular division is one of the two pro- 
cesses that allow the multicellular organization of complex living beings, 
and is therefore a key mechanism for the implementation of bio-inspired 
features such as development (growth) and self-repair (cicatrization). In 
particular, we shall describe the architecture and operation of a new kind 
of programmable logic device capable of realizing, in silicon, a cellular 
division process. 



1 Introduction 

The majority of living beings, with the exception of unicellular organisms like 
viruses and bacteria, share a common multicellular organization: the organism 
is divided into a finite number of cells, each realizing a single function (skin, 
neuron, muscle, etc.). This architecture relies on two mechanisms: 

— Cellular division is the process through which each cell achieves its duplica- 
tion. During this phase, a cell copies its genetic material (the genome) and 
splits into two identical daughter cells. 

— Cellular differentiation defines which function a cell has to realize. This spe- 
cialization, which essentially depends of the cell position in the organism, is 
obtained through the expression of a part of the genome. 

In a multicellular organism, each cell contains the whole of the organism’s 
genetic material (the genome) and is therefore “universal”, i.e. potentially capa- 
ble of replacing any other cell. In presence of a physical degradation, each living 
organism is then potentially capable of self-repair (cicatrization) . 

Of these two mechanisms, the most difficult to realize in silicon-based sys- 
tems is cellular division. In biological organisms, in fact, this feature implies, if 
not the creation, at least the formation of new physical entities (cells). Such ma- 
nipulation of the material substrate is not possible in today’s electronic circuits, 
which cannot be physically altered after fabrication. Luckily for bio-inspired 
research, programmable logic devices (FPGAs) [9,11,3] can be used to approx- 
imate this process by allowing the manipulation not of the physical substrate. 
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but of the logical structure of a circuit. This article describes our approach to 
the implementation of a process analogous to biological cellular division in such 
a programmable circuit. 

Our approach is based on the Embryonics project (Section 2) for all that 
concerns multi-cellular organization and cellular differentiation, and on the Cell 
Matrix system (Section 3) for self-inspection based replication. Section 4 will 
then introduce the system we have developed. An example of our system will 
presented in section 5. 

2 The Embryonics Project 

The main goal of the Embryonics (embryonic electronics) project [8,10] is to 
implement, in an integrated circuit, a system inspired by the development of 
multi-cellular organisms and capable of self-test and self-repair. 

In Embryonics (Figure 1), an artificial organism (ORG)^ is realized by a set 
of cells, distributed at the nodes of a regular two-dimensional grid. Each cell 
contains a small processor coupled with a memory used to store the program 
(identical for all the cells) that represents the organism’s genome. In the organ- 
ism, each cell realizes a unique function, defined by a sub-program called the 
gene, which is a part of the genome. The gene to be executed in a cell is se- 
lected depending on the cell’s position within the organism, defined by a set of 
X and Y coordinates. In Figure I, the genes are labeled A to F for coordinates 
(A,y) = (l,l)to (A,r) = (3,2). 

The first kind of self-replication in Embryonics systems is that of the organ- 
ism: an artificial organism is capable of replicating itself when there is enough 
free space in the silicon circuit (at least six cells in the example of Figure 1) to 
contain the new daughter organism and if the calculation of the coordinates pro- 
duces a cycle. In fact, as each cell is configured with the same information (the 
genome), the repetition of the vertical coordinate pattern {Y =1— >-2— >-1— >-2) 
causes the repetition of the same pattern of genes and therefore, in a sufficiently 
large array, the self-replication of the organism for any number of specimens in 
the X and/or the Y axes. 

In Embryonics, however, there is a second replication process, which corre- 
sponds to the cellular division process in biological entities, used to put in place 
the initial array of cells that will then be differentiated to obtain the organism. 

The need to build cells of different size and structure depending on the ap- 
plication naturally led to the use of programmable logic (FPGAs) as a physical 
substrate in Embryonics. Each of the computational elements of the FPGA can 
then be seen as molecules, assembled in a precise configuration to form a cell. 
As all cells are identical, the development process is analogous to the replication 
of this configuration for as many times as there are cells in the organism. 

To implement this replication, Embryonics splits the process into two phases: 

— the structural phase, where a “skeleton” is created in order to divide the 
physical space in a collection of groups of molecules which are empty cells; 

Group of artificial cells that executes a given task 
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•ections of self-replication 




Fig. 1. Self-replication of a 6-cell organism in a limited homogeneous array of 6x4 cells. 
Only the expressed gene is shown in each cell. 



~ the configuration phase, where the configuration is sent in parallel into all 
the empty cells created during the structural phase. 

The structural phase is implemented by a small cellular automaton (CA) [1], 
integrated in the molecular grid, capable of transforming a one-dimensional 
string of states (analogous to a configuration bitstream stored in a memory 
chip) into a two-dimensional structure (the blocks that will host the cells). 

Figure 2 shows that the CA elements are placed in the spaces between the 
molecules of the FPGA. With an appropriate sequence of states, the automaton 
will be able to partition the array into identical blocks of variable size. The 
“skeleton” created by the automaton during the structural phase can be seen as 
the membrane of the artificial cells. Once the membrane is in place, the second 
part of the cellular self-replication begins: a bitstream containing the genome 
is sent to all the blocks in parallel (Figure 2), automatically creating multiple 
copies of the same artificial cell. At the end of this phase, the coordinate-based 
differentiation mechanism defines the structure of the organism. 

This replication process is quite different from biological cellular division, as 
all cells are created in parallel. Our new approach, designed to be integrated 
with Embryonics, implements a cellular division much closer to reality. 



3 Cell Matrix 

Developed by N. Macias [4], Cell Matrix is a fine-grained reconfigurable archi- 
tecture, composed of a two-dimensional grid of identical elements (referred to as 
cells^). Each cell in the grid is interconnected with its four cardinal neighbors 
and contains a lookup table (LUT) used as a truth table to define its ouputs. 

^ A cell in the Cell Matix architecture can be compared to a molecule in the Embry- 
onics tissue 




220 E. Petraglio et al. 



MEMBRANE 



CONFIGURATION 

PATH 







MEMBRANE 

STATE 



FPGA 

ELEMENTS 



FPGA 

CONFIGURATION 



INPUT SEQUENCE 
| 2 | 1 | 1 | 2 | 1 | 1 | 2 | 1 | 1 | 2 ] 






Fig. 2. First, an input sequence is sent to the CA (represented by the lozenge network) 
in order to set up the membrane of each cell. Then, the genome is sent in parallel to 
each cell composing the ORG. 



Cells of Cell Matrix circuits are capable of self-replication using a self- 
inspection mechanism: each cell is at the same time configurable and able to 
configure other cells without any external command. This feature is called self- 
duality, and requires two different modes of operation for the cell: D-mode (Data 
mode) and C-mode (Control mode). In D-mode, the cell’s LUT processes the 
four input signals in order to generate output signals, whereas in C-mode, the 
input data is used to fill up (configure) or re-write (re-configure) the LUT of the 
cell. 

The self-duality feature is the key to the self-replication mechanism of a Cell 
Matrix cell. Using this mechanism, a cell is able to produce an identical copy of 
itself, inspecting its own truth table and copying it to another cell in the circuit. 

Figure 3 shows a 2x2 Cell Matrix grid, in which each cell has two inputs and 
two outputs per side. Each cell contains an internal 16-row by 8-column truth 
table, in which four 4-variable universal functions can be realized. This internal 
memory governs the combinational behavior of the cell. 




Fig. 3. 2x2 Cell Matrix grid. Each cell, which has two inputs and two outputs per edge, 
is connected with its four direct neighbors. 
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A cell uses its D lines to exchange information with its neighbors; the nature 
of this information is defined by the mode in which the cell is operating. In 
D-mode, a cell reads its D input values and uses them to select one of the 
16 lines of its LUT, generating the eight outputs of the cell. In C-mode, the 
information sent to the cell trough the D lines is serially shifted into its LUT, 
while the previous contents of the LUT are shifted out to the D outputs. Using 
this mechanism, a cell is able to configure its neighbors. The C inputs are used 
to define the operating mode of the cell. If any of the cell’s C inputs is 1, the 
cell is in C-mode, otherwise it is in D-mode. 

A cell, using its LUT, can change its C output values and therefore can 
control the mode of any neighboring cell. By changing the mode of its neighbor 
cells, a single cell can reach and control any cell placed in the circuit. Figure 4 
shows a typical programming sequence. First, cell X configures the LUT of the Y 
cell in order to transform it in a simple wire, which has to bypass C and D data 
from X to Z. After this step the D and C outputs of X are directly connected 
to the D and C inputs of Z, hence cell X is able to configure the LUT of Z by 
sending directly its information through the D line. 



(7t1) 




Fig. 4. Cell X configuring non-adjacent cell Z. On the left, cell X, by using truth table 
7t1, configures cell Y (its direct neighbor) as a wire. On the right, cell Y is configured 
with 7 t 1 truth table and cell X can directly configure cell Z using 7 t 2 truth table. 



Our approach will combine the self-replication capabilities of a Cell Matrix 
architecture with the Embryonics approach in order to create a new molecule 
which will better approximate the biological cellular division process in Embry- 
onics systems. 

4 Self-Inspection on Embryonics Tissues 

In our approach, we will redefine the molecules of the Embryonics project in 
a form similar to the cells of the Cell Matrix circuit (a LUT and two types 
of input/output connections) in order to realize a cellular division mechanism 
based on self-inspection [2] in Embryonics systems. Moreover, we will present the 
design of a finite state machine (FSM), which is placed in each molecule: during 
the configuration phase, this FSM will control the hardware configuration and 
the communication protocol of the molecule. 

By definition, in self-inspection based self-replication, a replicating system 
has to generate its description by examining its own structure. Such a descrip- 
tion is then used to recreate an identical copy of the original system [7]. In 
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our case, the system which performs the self-replication is realized on an inte- 
grated circuit which consists of a surface of silicon divided in identical elements 
(i.e., molecules) distributed at the nodes of a regular two-dimensional grid. Each 
molecule contains the same hardware components and only the contents of its 
registers can differentiate it from its neighbors. Therefore, a self-replicating cell 
creates its description by reading the contents of its register. We will say that 
our system performs a simplified kind of self-inspection because it achieves self- 
replication by copying the contents of its registers and not by generating the 
description of the new copy. 

4.1 Molecular Self-Replication 

The first step of the design is the realization of a molecule capable of replicating 
its content in one of its four neighbors. It has to be noted that this phase of 
self-replication draws inspiration from the Cell Matrix mechanism presented in 
figure 4. However, the design of our molecule is totally original as the molecule 
has to fit the requirements of Embryonics. 

Let us consider that the information that a molecule has to send to its neigh- 
bor in order to replicate itself is fully contained in its LUT. Since the molecule, 
as in the Cell Matrix cell, has one D output line per edge, it has to read and send 
the content of its LUT serially. To implement this behavior, the molecule can 
use its LUT as a shift-register in which input and output are connected together. 
The molecule can then create a rotating bitstream and read (self-inspect) and 
send out (self-replicate) its information through the D output line, as shown in 
Figure 5. 




Fig. 5. A shift register implementing the rotating memory. 



The molecule as in Cell Matrix, also has to set its C output line to 1 in 
order to change the operation mode of its target neighbor. The latter is now 
able to receive the information arriving form the source molecule. The end of 
the replication process is signaled by a one-bit register (called overflow) that is 
set when the LUT has been completely filled (the mechanism to detect the end 
of the replication process in Cell Matrix is not described in the publications). 
By testing the register, the target molecule knows that the replication process is 
finished and sends an an acknowledgment signal through its C output line to the 
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source molecule (Figure 6). Once the configuration is over, the source molecule 
returns in a quiescent state while the target molecule becomes a source molecule 
and starts to self-replicate. 
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Fig. 6. (a) First, the two molecules are in C-mode. The source molecule sends its 
memory content, through the D line, into the target cell, (b) When the target molecule 
is configured (i.e. the overflow bit is set), it sends back an acknowledgment signal, 
which will stop the configuration process. 




Fig. 7. Sequence of states describing the behavior of a molecule during the molecular 
self-replication. 



As a Cell Matrix cell, our molecule can operate in two different modes (C 
or D). The example of Figure 6, however, defines new switching rules between 
these two modes. This behavior can be represented by the state graph shown in 
Figure 7. 

In the Waiting for configuration state, the molecule is not configured and 
quiescent. When a C input becomes 1, the molecule passes into the Receiving 
configuration state, switches to C-mode, and starts filling its LUT with the 
incoming configuration. When the configuration process is over, the molecule 
generates an acknowledgment signal and waits for its C input value to return to 
0. At this point, the molecule changes its internal state to Replicating but is still 
operating in C-mode, ready to configure one of its neighbors. Once the molecule 
has sent out its LUT content and received the acknowledgment signal, it returns 
into D-mode and reaches the last state {Ready to operate) on the graph, where 
it will remain during the normal operation of the circuit. 
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4.2 Cellular Self-Replication 

The next step of the design is the insertion of new hardware that will allow 
the molecules to assemble in an artificial cell. It has to be noted that, in recent 
publications [5,6], the Cell Matrix system has been extended to include the 
possibility of grouping cells in order to compose a scructure called super-celP . 
However, we choose to develop our own system capable of creating artificial cells. 
Such a system will be expressly conceived to work with our new molecules and 
will perfectly be compatible with the requirements of the Embryonics project 
(self-replication and local fault-tolerance). As a consequence, the mechanisms 
introduced in this section for the creation of cells are very different from those 
exploited by Cell Matrix’s super-cells. 

An artificial membrane, as in Embryonics cells, will be used to define which 
molecules belong to which cell. This membrane will be realized with a 4-bit 
register (MembReg) which will store the relative position of the molecule in the 
cell (Figure 8), thus defining its state. 

In a dividing cell, each molecule can be in one of the following states: 

— ready to operate: the molecule has already executed its replication; 

— replicating: the molecule is configuring one of its neighbors; 

— ready to replicate: the molecule is waiting to start its self-replication. 

Therefore, the FSM introduced above must be modified in order to handle 
the new molecular states. 




Fig. 8. A cell composed of 2x2 molecules. Each MembReg is filled in order to define 
the position of the molecule in the cell. N.B: If the molecule is placed in the center of 
the cell the MembReg contents is equal to zero. 



As an example. Figure 8 shows a set of 2x2 molecules, which will represent 
the source cell that will use self-inspection to self-replicate its contents (i.e. the 

® A super-cell in the Cell Matix architecture can be compared to a cell in the Embry- 
onics tissue 
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artificial genome) in order to create a daughter cell on its right. First, each 
molecule in the cell will use the membrane information to know its position. In 
this example, a molecule can start to replicate (replicating state) if it belongs to 
the left part of the membrane; otherwise it has to wait (ready to replicate state). 

The molecules in the leftmost column send out their LUT contents on the east 
D output lines. The information travels through the columns of molecules in the 
ready to replicate state and reaches the empty target molecules (Figure 9). The 
source molecules configure the targets until they receive the acknowledgment 
signals (Figure 10). At this point, the target molecules are totally configured 
and in the ready to replicate state, the first column stops sending data, switches 
its state from replicating to ready to operate and changes the values of its C lines 
from 1 to 0. This change is picked up by the second column of molecules, which 
switches its state from ready to replicate to replicating and starts to replicate. 
The first cellular division is over when the second column of molecules ends its 
replication. 




Fig. 9. The molecules composing the source column are in the replicating state, while 
the other molecules in the mother cell are in the ready to replicate state. The grey boxes 
represent the configured elements, while the blank boxes represent empty elements. 



The state graph for this new cellular behavior is shown in Figure 11. 

It should be noted that, according to the new state graph, a cell is able to self- 
replicate in one direction only. That is, the self-replication mechanism presented 
above is not able to fill up an FPGA circuit, which is a two-dimensional structure. 
Therefore, the molecule and the states graph have to be further modified in order 
to perform a two-dimensional cellular division. A possible solution is to use a 
second acknowledgment signal, Ack Cell, which will inform all the molecules in 
the cell that the cellular division is over and instruct the cell has to restart its 
replication in a different direction. 
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Fig. 10. The molecule composing the target column are totally configured. Each 
molecule, using the overfiow register, generates an acknowledgment signal in order 
to stop the molecular self-replication. 




Fig. 11 . Sequence of states describing the behavior of a molecule during the cellular 
self-replication. 



For example, if a cell is created in the bottom left corner of an FPGA, this 
cell has to replicate itself to the north and then to the east. In order to fill all 
the available space in the circuit, each new cell has to replicate exactly as its 
mother. It should be noted that, with these growth rules, two source cells could 
be configuring the same set of molecules at the same time. In order to avoid such 
data collisions, only the cells on the east side of the FPGA are able to replicate 
in the two dimensions. Figure 12 shows the two-dimensional growth behavior. 



FPGA 







Target 

t 




1 

Source 



t = o 



FPGA 



Target 

t 




Source 






Source 


Target 





t=1 



FPGA 



Source 






Source 


Target 






Source 


Target 



t = 2 



Fig. 12. Cellular division, two-dimensional growing behavior. 
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5 An Example: The LSL Organism 

This section will present an extremely simplified example, the display of the 
acronym “LSL” , for Logic Systems Laboratory. This acronym is considered as a 
one-dimensional artificial organism composed of three cells (Figure 13a). Each 
cell is located by a X coordinate, ranging from 1 to 3 in decimal or from 01 to 
11 in binary. For coordinate values X = 1 and AT = 3, the cell should implement 
the L character, while for X = 2, it should implement the S character. A totipo- 
tent cell (Figure 13b) comprises then 6 x 7 = 42 molecules, 36 of which being 
invariant, five displaying the S character, and one displaying the L character. A 
modulo-3 counter is embedded in the final organism and implements the truth 
table shown in table 1. 



Table 1. The modulo-3 counter. 



Character 


X 


XI xo 


X-f 


X1+ XO-h 


L 


1 


0 1 


2 


1 0 


S 


2 


1 0 


3 


1 1 


L 


3 


1 1 


1 


0 1 



The modulo-3 counter is represented by the logic diagram and symbol of 
figure 15. According to the table, the value of the binary variable AO is sufficient 
to distinguish the display of character L (AO = 1) from the display of character 
S (AO = 0 or AO = 1). It is important to note that the modulo-3 counter is 
designed to produce a cycle in the calculation of the coordinates, in our example: 
A = l— i^2— >-3— fl... and A = 1 — >■ 1 — ?► 1.... This coordinate calculation will 
differentiate the cells in order to produce as artificial organisms composed of 
three cells as it is possible (Figure 14). 





(b) 



Fig. 13. (a) The three cells of the artificial organism, (b) The totipotent cell. 



These specifications are sufficient to design the final architecture of the 
totipotent cell, which is shown in figure 16. The final cell is composed of seven 
different types of molecules, each of which realizes one of the following tasks: 
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X= 1 2 3 1 2 3 



Fig. 14. A population of artificial organisms obtained by the repetition of the cellular 
coordinate A = >-2— >-3— >■ 1... and F = 1 1 — >■ 1.... 




Fig. 15. Logic diagram ans symbol of the incrementer. 



— 0: Function- less molecule. 

— 1: Two horizontal busses, which carry XO and XI signals. 

— 2: Modulo 3 intcrementation. 

— 3: One vertical bus, which carries the XO signal. 

— 4: Permanent switch-on display. 

— 5: Display of S character only (XO = 0 or XO = 1). 

~ 6: Display of L character only (XO = 1). 

Figure 16 shows that each molecule composing the totipotent cell has to 
memorize two characters. The first character, which is in brackets, represents 
the hexadecimal value memorized in the MembReg and used to construct the 
artificial membrane (Table 2), while the second character, which is in bold font, 
controls the molecular behavior and can be coded using three bits. Therefore, in 
order to implement this example, the logic core of our new molecule is composed 
by a 4x1 LUT, which is used to memorize the second character controlling the 
task that the molecule has to realize. 

The artificial genome, which has to be injected in the FPGA in order to 
create the first totipotent cell is represented in figure 17. It has to be noted that 
each row represents the configuration stream of a row of molecules, and can be 
splitted in several groups of 8-bits each group representing the configuration of 
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Fig. 16. The final totipotent cell made up of 6x7 = 42 molecules 



a single molecule (i.e. membrane information -|- molecular task). Analyzing the 
membrane information of the first row we can see that this flow of information 
will configure the south row of the cell. 

Last, it was possible to embed the basic molecule in each of the 2000 field- 
programmable gate arrays of the BioWall [12] and to show the growth of the 
the multicellular artificial organism, followed by its selfreplication in both verti- 
cal and horizontal dimensions (Figure 18). We therefore, obtain a population of 
identical organisms (i.e. clones) thus creating a fourth level in our hierarchy, a 
population of organisms. 

In figure 18 it is interesting to note that two cells are displaying a corrupted 
line (or column) . This is the characteristic behavior of a cell during its replication 
phase. In fact, a replicating cell, inspects its content in order to sent it to its 
daughter. Therefore, during this phase, the cell is not able to correctly perform 
its task (in our example: display a L or a S). 
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Table 2. Possible contents of the MembReg register 



HEX value 


N S E W 


position in the cell 


0 


0 0 0 0 


molecule placed in the center of the cell 


1 


0 0 0 1 


west molecule 


2 


0 0 10 


east molecule 


4 


0 10 0 


south molecule 


5 


0 10 1 


southwest molecule 


6 


0 110 


southeast molecule 


8 


10 0 0 


north molecule 


9 


10 0 1 


northwest molecule 


A 


10 10 


northeast molecule 


others 





invalid combination 



^-0101 0000* 


i- 01000000* 


L 0103 0000* 


L 0100 0000* 


L 0100 0000*1 


^0001 0000* 


*-00000100* 


*-0003 0100* 


*- 0000 0 00* 


L 

^ OOOO 0000*^ 


1- 0001 oooo* 


L 0000 01 10* 


1-0003 001 !♦ 


l-OOOOO'.Ol* 


*- 0000 0000*] 


*-0001 0000* 


*-00000100* 


*-0003 0101* 


*- 0000 O' 01* 


*- 0000 0000*] 


l-OOOl 0000* 


L 00000100* 
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1- 0000 0011* 
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Fig. 17. The artificial genome of the totipotent cell. 



6 Conclusion 

This paper presented a novel approach for self-replication for Embryonics sys- 
tems, closer to biological cellular division than past mechanisms. The new 
molecule thus introduced will be the basic block of a new generation of Em- 
bryonics circuits, and is designed to be integrated into a more complex system 
to implement a full developmental (growth) process in digital hardware. 

However, self-replication, as described, is not a very useful feature for current 
real-world applications, as self-replication is completely deterministic. In other 
words, its result is a predictable FPGA configuration, which could be realized 
by conventional FPGA place and route tools. 

This argument, however, is not valid if the self-replication process embeds 
fault-detection and self-repair mechanisms to, for example, grow artificial or- 
ganisms on an FPGA with fabrication flaws. In fact, standard FPGA place and 
route tools are not capable of generating bitstream configuration files for FPGA 
circuits which are not fault-free. A cell composed of molecules capable of au- 
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Fig. 18. The BioWall implementation 



tonomous self-replication, fault-detection and self-repair, however, will be able 
to handle flawed FPGAs by adapting its structure so as to avoid faulty areas 
of the circuit. The next step in the design of the new Embryonics molecule will 
address this issue and realize autonomous self-replication in faulty circuits. 

Moreover, the predicted development of molecular-level electronics (such as 
nanotechnologies) implies a forthcoming need for mechanisms that will allow a 
circuit to structure itself, rather than having a structure imposed at fabrication. 
A cellular division approach such as the one described in this article could be a 
useful tool to achieve this kind of self-organization. 
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Abstract. Nowadays, networks of artificial spiking neurons might contain 
thousands of synapses. Although software solutions offer flexibility, their per- 
formance decreases while increasing the number of neurons and synapses. Em- 
bedded systems very often require real-time execution, which do not allow an 
unconstrained increasing of the execution time. On the other hand, hardware 
solutions, given their inherent parallelism, offer the possibility of adding neu- 
rons without increasing the execution time. In this paper we present a func- 
tional model of a spiking neuron intended for hardware implementation. Some 
features of biological spiking neurons are abstracted, while preserving the 
functionality of the network, in order to define an architecture with low imple- 
mentation cost in field programmable gate arrays (FPGAs). Adaptation of syn- 
aptic weights is implemented with hebbian learning. As an example application 
we present a frequency discriminator to verify the computing capabilities of a 
generic network of our neuron model. 



1 Introduction 

The human brain contains more than lO" neurons connected among them in an intri- 
cate network. Communication between neurons is done by spikes, where millions are 
emitted each second in every volume of cortex. Issues like the information contained 
in such a spatio-temporal pattern of pulses, the code used by the neurons to transmit 
information, or the decoding of the signal by receptive neurons, have a fundamental 
importance in the problem of neuronal coding. They are, however, still not fully re- 
solved. [I] 

Biological neurons are extremely complex biophysical and biochemical entities. 
Before designing a model it is therefore necessary to develop an intuition for what is 
important and what can be safely neglected. Biological models attempt to describe the 
neuron response, as truthful as possible, in terms of biologic parameters, such as ion 
channels conductances or dendrites lengths. However, biological models are not suit- 
able for computational purposes, because of that phenomenological neural models are 
proposed extracting the most relevant features from their biological counterparts. 

As for any model of neuron, adaptivity is required for a network of these neural 
models. Understanding adaptivity as any modification performed on a network to 
perform a given task, several types of methods could be identified, according to the 
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type of modification done. The most common methods modify either the synaptic 
weights [2] or/and the network topology [5,1 1]. 

Synaptic weight modification is the most widely used approach, as it provides a 
relatively smooth search space. On the other hand, the sole topology modification 
produces a search space with a highly rugged landscape (i.e. a small change on the 
network results on very different performances), and although this type of adaptation 
allows to explore well the space of computational capahilities of the network, it is 
difficult to find a solution. Growing [5], pruning, and genetic algorithms [11] are 
adaptive methods that modify a network topology. 

A hybrid of both methods could achieve better performances, because the weight- 
adaptation method contributes to smooth the search space rendering easier to find a 
solution. We propose, thus, a hybrid method where an adaptation of the structure is 
done by modifying the network topology, allowing the exploration of different com- 
putational capabilities. The evaluation of these capabilities is done by weight- 
learning, finding in this way a solution for the problem. However, topology modifi- 
cation implies a high computational cost. Besides the fact that weight learning can be 
time-consuming, it would be multiplied by the number of topologies that are going to 
be explored. Under these conditions, on-line embedded applications would be unfea- 
sible, unless it is available enough knowledge of the problem in order to restrict the 
search space just to tune certain small modifications on the topology. 

A part of the problem can be solved with a hardware implementation: in this case 
the execution time is highly reduced since the evaluation of the network is performed 
with the neurons running in parallel. A complexity problem remains: while on soft- 
ware, extra neurons and connections imply just some extra loops, in hardware imple- 
mentation there is a limited area that bounds the number of neurons that can be placed 
on a network. This is due to the fact that each neuron has a physical existence, occu- 
pying a given area and that each connection implies a physical cable that must con- 
nect two neurons. Moreover, if an exploration of topologies is done, the physical 
resources (connections and neurons) for the most complex possible networks must be 
allocated in advance, even if the final solution is less complex. This fact renders the 
connectionism a critical issue since a connection matrix for a high amount of neurons 
is considerably resource-consuming. 

Recent FPGAs allow tackling this resource availability problem thanks to their dy- 
namic partial reconfiguration (DPR) feature, which allows the reusing of internal 
logic resources. This feature permits to dynamically reconfigure the same physical 
logic units with different configurations, reducing the size of the hardware require- 
ments, and optimizing the number of neurons and the connectivity resources. 

Topology evolution with DPR is part of our project but it is not presented in this 
paper, a description of it can be found in [9]. The DPR feature allows having a 
modular system as that described in Figure 1: different possible configurations are 
available for each module, being possible to communicate with neighbor modules, 
allowing spikes to be transmitted forward or backward for recurrent networks. Mod- 
ules contain layers of neurons with a predefined connectionism, and a genetic algo- 
rithm would search for the best combination of layers. 

In this paper we present a hardware-architecture for a functional neuron with heb- 
bian learning, intended for embedded smart devices. A neural network is imple- 
mented on an FPGA to check the amount of neurons with a given connectionism that 
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could be contained on a given device. The network topology corresponds to the mod- 
ule-based system that we have just described. In order to verify the learning capabili- 
ties of our network, a dynamic problem is assigned to the network: discrimination of 
two signals with different frequencies. 




Fig. 1. Layout of our reconfigurable network topology. Fixed and reconfigurable modules are 
allowed. In this example, fixed modules code and decode inputs and outputs to spikes, while 
reconfigurable modules contain network layers. 



2 Neural Models 

Most neuron models, such as perceptron or radial basis functions, use continuous 
values as inputs and outputs, processed using logistic, gaussian or other continuous 
functions [5,2]. In contrast, biological neurons process pulses: as a neuron receives 
input pulses by its dendrites, its membrane potential increases according to a post- 
synaptic response. When the membrane potential reaches a certain threshold value, 
the neuron fires, and generates an output pulse through the axon. The best known 
biological model is the Hodgkin and Huxley model (H&H) [3], which is based on ion 
current activities through the neuron membrane. 

However, given their complexity, the most biologically plausible models are not 
the best suited for computational purposes. This is the reason why other simplified 
approaches are needed. The leaky integrate and fire (LI&F) model [1,4], based on a 
current integrator, models a resistance and a capacitor in parallel. Differential equa- 
tions describe the voltage given by the capacitor charge, and when a certain voltage is 
reached the neuron fires. The spike response model order 0 (SRMq) [1,4] offers a 
resembling response to that of the Ll&F model, with the difference that the mem- 
brane potential is expressed in terms of kernel functions instead of differential equa- 
tions. 

Spiking-neuron models process discrete values representing the presence or ab- 
sence of spikes; this fact allows a simple connectionism structure at the network level 
and a striking simplicity at the neuron level. However, implementing models like 
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SRMo and LI&F on digital hardware is largely inefficient, wasting many hardware 
resources and exhibiting a large latency due to the implementation of kernels and 
numeric integrations. This is why a functional hardware-oriented model is necessary 
to achieve fast architectures at a reasonable chip area cost. 



2.1 The Proposed Neuron Model 

Our simplified integrate and fire model [8], as standard spiking models, uses the fol- 
lowing five concepts: (1) membrane potential, (2) resting potential, (3) threshold 
potential, (4) postsynaptic response, and (5) after-spike response (see figure 2). A 
spike is represented by a pulse. The model is implemented as a Moore finite state 
machine. Two states, operational and refractory, are allowed. 

During the operational state, the membrane potential is increased (or decreased) 
each time a pulse is received by an excitatory (or inhibitory) synapse, and then it 
decreases (or increases) with a constant slope until the arrival to the resting value. If a 
pulse arrives when a previous postsynaptic potential is still active, its action is added 
to the previous one. When a firing condition is fulfilled (i.e., potential > threshold) 
the neuron fires, the potential takes on a hyperpolarization value called after-spike 
potential and the neuron passes then to the refractory state. 



input spikes 




outnut stDike 





Fig. 2. Response of the model to a train of input spikes. 

After firing, the neuron enters in a refractory period in which it recovers from the 
after-spike potential to the resting potential. Two kinds of refractoriness are allowed: 
absolute and partial. Under absolute refractoriness, input spikes are ignored. Under 
partial refractoriness, the effect of input spikes is attenuated by a constant factor. The 
refractory state acts like a timer that determines the time needed by a neuron to re- 
cover from a firing; the time is completed when the membrane potential reaches the 
resting potential, and the neuron comes back to the operational state. 

Our model simplifies some features with respect to SRMq and LI&F, mainly, the 
post-synaptic response. The way in which several input spikes are processed affects 




A Hardware Implementation of a Network of Functional Spiking Neurons 237 

the dynamics of the system; under the presence of 2 simultaneous input spikes, SRMq 
performs a linear superposition of post-synaptic responses, while our model adds the 
synaptic weights to the membrane potential. Even though our model is less biologi- 
cally plausible than SRMq and LI&F, it is still functionally adequate. 



2.2 Learning 

Weight learning is an issue that has not been satisfactorily solved on spiking neuron 
models. Several learning rules had been explored by researchers, being hebbian 
learning one of the most studied [1,4]. Hebbian learning modifies the synaptic weight 
Wij, considering the simultaneity of the firing times of the pre- and post-synaptic 
neurons i and j. Herein we will describe an implementation of hebbian learning ori- 
ented to digital hardware. Two modules are added to the neuron presented above: the 
active-window module and the learning module. 

The active-window module determines if a given neuron learning-window is active 
or not (Figure 3). The module activates the output active window (aWi) during a cer- 
tain time after the generation of a spike by a neuron The aw, signal is given by aw, 
= step(/i) - stepi^i +w), where /, is the firing time of «, and w is the size of the learn- 
ing window. This window allows the receptor neuron («,) to determine the synaptic 
weight modification (AW,;) that must be done. 



aw. 




f-j ih th f 



Fig. 3. Hebbian learning windows. When neuron 3 fires at / /, the learning window of neurons 
Hi and «2 are disabled and enabled respectively. At time Synaptic weight Wij is decreased by 
the learning algorithm, while W 23 is increased. 

The learning module modifies the synaptic weights of the neuron, performing the 
hebbian learning (Figure 3). Given a neuron («,) with k inputs, when a firing is per- 
formed by «, the learning modifies the synaptic weights (Wy with j=l,2,...k). We must 
define in advance a learning rate a and a decay rate P, to obtain the expression: AWy = 
a aWj - P, where a and p are positive constants and a> p. 

These two modules, active-window and learning, increase the amount of interneu- 
ron connectivity as they imply, respectively, one extra output and k extra inputs for a 



neuron. 
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3 The Proposed Neuron on Hardware 

Several hardware implementations of spiking neurons have been developed on analog 
and digital circuits. Analog electronic neurons use to be difficult to setup and debug. 
On the other hand, digital spiking neurons, are easier to setup, debug, scale, and learn, 
among other features. Additionally these models can be rapidly prototyped and tested 
thanks to configurable logic devices such as field programmable gate arrays (FPGAs). 



3.1 Field Programmable Gate Arrays 

An FPGA circuit is an array of logic cells placed in an infrastructure of interconnec- 
tions. Each logic cell is a universal function or a functionally complete logic device 
that can be programmed by a configuration bitstream, as well as interconnections 
between cells, to realize a certain function [7]. Some FPGAs allow performing dy- 
namic partial reconfiguration (DPR), where a reduced bitstream reconfigures just a 
given subset of internal components. DPR is done with the device active: certain 
areas of the device can be reconfigured while other areas remain operational and 
unaffected by the reprogramming. 



3.2 Implementation of Our Model 

The hardware implementation of our neuron model is illustrated in Figure 4. The 
computing of a time slice (iteration) is given by a pulse at the input clk_div, and takes 
a certain number of clock cycles depending on the number of inputs to the neuron. 
The synaptic weights are stored on a memory, which is swept by a counter. Under the 
presence of an input spike, its respective weight is enabled to be added to the mem- 
brane potential. Likely, the decreasing and increasing slopes (for the post-synaptic 
and after-spike responses respectively) are contained in the memory. 

Although the number of inputs to the neuron is parameterizable, increasing the 
number of inputs implies raising both: the area cost and the latency of the system. 
Indeed, the area cost highly depends on the memory size, which itself depends on the 
number of inputs to the neuron (e.g. the 32x9-neuron on figure 4 has a memory size 
of 32x9 bits, where the 32 positions correspond to up to 30 input weights and the 
increasing and decreasing slopes; 9 bits is the arbitrarily chosen data-bus size). The 
time required to compute a time slice is equivalent to the number of inputs H-1 -i.e. 30 
inputs plus either the increasing or the decreasing slope. (More details on the neuron 
architecture can be found on [9].) 

The dark blocks on Figure 4, active -window and learning module, perform the 
learning on the neuron. The active window block consists on a counter that starts a 
count when an output spike is generated, and stops when a certain value is reached. 
This value constitutes the learning window. The output aw_out values logic-0 if a 
stop condition is met for the counter and logic- 1 otherwise. 
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Fig. 4. (a) External view of the neuron, (b) Architecture of a neuron. 

The learning module is the one who performs the synaptic weight learning. This 
module computes the change to be applied to the weights (AW), provided that the 
respective learning window is active, and maintaining the weights bounded. At each 
clock cycle the module computes the new weight for the synapse pointed by the 
COUNTER signal; however, these new weights are stored only when an output spike 
is generated by the current neuron, which enables the write-enable on the memory. 



4 Experimental Setup and Results 

The experimental setup consists of two parts: the synthesis of a spiking neural net- 
work on an FPGA and a simulation of this network solving a problem of frequency 
discrimination running on Matlab. 



4.1 The Network on Hardware 

A neural network was implemented on an FPGA in order to check the number of 
neurons that we could include on a network. We worked with a Spartan II xc2s200 
FPGA from Xilinx Corp.[10] with a maximum capacity of implementing until 
200.000 logic gates. This FPGA has a matrix of 28 x 42 configurable logic blocks 
CLBs, each one of them containing 2 slices, which contain the logic where the func- 
tions are implemented, for a total of 2352 slices. The xc2s200 is the largest device 
from the low cost FPGA family Spartan II. Other FPGAs families such as Virtex II 
offer up to 40 times more logic resources. 

We implemented the 30-input neuron described in Figure 4 both: with and without 
learning (Table 1). Without learning the neuron used 23 slices (0.98% of the whole 
FPGA), while with the learning modules it used 41 slices (1.74%). In order to observe 
the scalability of the neuron model, we implemented also two other non-learning 
neurons. A 14-input neuron (requiring a smaller memory to store the weights) needed 
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17 slices (0.72%), and a 62-input neuron used 46 slices (1.95%). The resources re- 
quired by our neurons are very low as compared to other more biologically-plausible 
implementations (e.g., Ros et al [6] use 7331 Virtex-E CLB slices for 2 neurons). 

Table 1. Synthesis results for the neurons. 



Non-learning neurons Learning neurons 



14-input 

neuron 


30-input 

neuron 


62-input 

neuron 


30-input 

neuron 


Full network 
(30 30-input neurons) 


17 slices 
0.72% 


23 slices 
0.98% 


46 slices 
1.95% 


41 slices 
1.74% 


1350 slices 
57.4% 



Using the 30-input neuron, we implemented a network with 3 layers, each layer 
containing 10 neurons that are internally full-connected. Additionally, layers provide 
outputs to the preceding and the following layers, and receive outputs from them, as 
described in Figure 5. For the sake of modularity, each neuron has 30 inputs: 10 from 
its own layer, 10 from the preceding one, and 10 from the next one. When imple- 
menting, a single layer uses 450 slices (19.13%), and the full network, with 30 neu- 
rons, needs 1350 slices (57.4%). 




4.2 Application: A Frequency Discriminator 

A frequency discriminator was implemented in order to test the capability of the 
learning network to unsupervisedly solve a problem with dynamical characteristics. 
We used the network described in the previous subsection, with the following consid- 
erations for data presentation: (1) we use 9 inputs at layer 1 to introduce the pattern, 
(2) two sinusoidal waveforms with different periods are generated, (3) the waveforms 
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are normalized and discretized to 9 levels, (4) a spike is generated every 3 time slices 
(iterations) at the input that address the discretized signal (Figure 6). 




1 



1.02 



1.04 



1.06 



1.08 

time slices 



1.1 



X 10 



4 



Fig. 6. Neural activity on a learned frequency discriminator. The lowest nine lines are the input 
spikes to the network: two waveforms with periods of 43 and 133 time slices are presented. 
The next 10 lines show the neuron activity at layer 1, as well as the following lines show the 
activity of layers 2 and 3. After 10.000 time slices a clear separation can be observed at the 
output layer where neurons fire under the presence of only one of the waveforms. 



For the simulation setup it was needed to consider some parameter ranges and 
resolutions, which are constrained by the hardware implementation, as shown in Ta- 
ble 2. Initial weights are integer numbers generated randomly from 0 to 127. 

Table 2. Set-up parameters 



Neuron Parameters Learning Parameters 



Resting-potential 


32 


Learning rate 


6 


Threshold-potential 


128 


Decay rate 


4 


After-spike potential 


18 


Weight upper bound 


127 


Increasing slope 


1 


Weight lower bound 


-32 


Decreasing slope 


1 


Learning window size 


16 


Potential lower bound 


-128 
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Different combinations of two signals are presented as shown in figure 6. During 
the first 6000 time slices the signal is swapped every 500 time slices, leaving an in- 
terval of 100 time slices between them, where no input spike is presented. Then, this 
interval between the signals presentation is removed, and these latter are swapped 
every 500 time slices (as shown in figure 6). 

Several combinations of signals with different periods are presented to the net- 
work. Some of the signals are correctly separated, while others are not (Table 3). 
Several simulations were done for each pair of periods, being easier to have satisfac- 
tory separations for a certain range of periods (from 50 to 1 10 aprox.). As periods are 
closer to these values, it is easier to find a network able to separate them. When both 
signals have periods above 120 or below 80 no separations were achieved by the 
network. 

Table 3. Signal periods presented to the network. Period units are time slices. 



Period 1 


Period 2 


Separation 


40 


100 


Yes 


43 


133 


Yes 


47 


73 


No 


47 


91 


Yes 


50 


100 


Yes 


73 


150 


No 


73 


190 


Yes 


101 


133 


Yes 


115 


190 


Partially 


133 


170 


No 


133 


190 


No 



It must be noticed that the separable period ranges highly depends on the data 
presentation to the network. In our case, we are generating a spike every 3 time slices; 
however, if higher (or lower) frequencies are expected to be separated, spikes must be 
generated at higher (or lower) rates. The period range is also given by the dynamic 
characteristic of the neuron: after-spike potential and increasing and decreasing 
slopes. They determine the membrane potential response after input and output 
spikes, playing a fundamental role on the dynamic response of the full network. 



5 Conclusions and Future Work 

We have presented a functional spiking neuron model suitable for hardware imple- 
mentation. The proposed model neglects a lot of characteristics from biological and 
software oriented models. Nevertheless, it keeps its functionality and it is able to 
solve a relative complex task like temporal pattern recognition. Since the neuron 
model is highly simplified, the lack of representation power of single neurons must be 
compensated by a higher number of neurons, which in terms of hardware resources 
could be a reasonable trade-off considering the architectural simplicity of the model. 
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With the frequency discriminator implementation, the use of the sole unsupervised 
hebbian learning has shown to be effective but not efficient. Although solutions were 
found for a given set of frequencies, we consider that better solutions could be found 
with the available amount of neurons. While for some classification tasks it remains 
useful, hebbian learning results inaccurate for other applications. On further work we 
will include hybrid techniques between hebbian and reinforcement or supervised 
learning. 

This work is part of a bigger project where neural networks topologies are going to 
be evolved in order to explore a larger search space, where adaptation is done by 
synaptic weight and topology modifications. Topology exploration is done using the 
partial reconfiguration feature of Xilinx FPGAs. 

Spiking-neuron models seem to be the best choice for this kind of implementation, 
given their low requirements of hardware and connectivity, keeping good computa- 
tional capabilities, compared to other neuron models [4]. Likewise, layered topolo- 
gies, which are among the most commonly used, seem to be the most suitable for our 
implementation method. However, other types of topologies are still to be explored. 

Different search techniques could be applied with our methodology. Genetic algo- 
rithms constitute one of the most generic, simple, and well known of these techniques 
but we are convinced that it is not the best one: it does not take into account informa- 
tion that could be useful to optimize the network, such as the direction of the error. 
An example of this could be found on [5] where growing and pruning techniques are 
used to find the correct size of a network. 



References 

1. W. Gerstner, Kistler W. Spiking Neuron Models. Cambridge University Press. 2002. 

2. S. Haykin. Neural Networks, A Comprehensive Foundation. 2 ed, Prentice-Hall, Inc, New 
Jersey, 1999. 

3. A. L. Hodgkin, and A. F. Huxley, (1952). A quantitative description of ion currents and 
its applications to conduction and excitation in nerve membranes. J. Physiol. (Lond.), 
117:500-544. 

4. W. Maass, Ch. Bishop. Pulsed Neural Networks. The MIT Press, Massachusetts, 1999. 

5. A. Perez-Uribe. Structure-adaptable digital neural networks. PhD thesis. 1999. EPFL. 
http://lslwww.epfl.ch/pages/publications/rcnt_theses/perez/PerezU_thesis.pdf 

6. E. Ros, R. Agis, R. R. Carrillo E. M. Ortigosa. Post-synaptic Time-Dependent Conduc- 
tances in Spiking Neurons: EPGA Implementation of a Elexible Cell Model. Proceedings 
of IWANN’03: LNCS 2687, pp 145-152, Springer, Berlin, 2003. 

7. S.M. Trimberger. Eield-Programmable Gate Array Technology. Kluwer Academic Pub- 
lishers, 1994. 

8. A. Upegui, C.A. Pena-Reyes, E. Sanchez. A Eunctional Spiking Neuron Hardware Ori- 
ented Model. Proceedings of IWANN'03: LNCS 2686, pp 136-143, Springer, Berlin, 
2003. 

9. A. Upegui, C.A. Pena-Reyes, E. Sanchez. A methodology for evolving spiking neural- 
network topologies on line using partial dynamic reconfiguration. To appear in proceed- 
ings of II- International Congress on Computational Intelligence (CIIC’03). Medellin, 
Colombia. 

10. Xilinx Corp. www.xilinx.com 

11. X. Yao. Evolving artificial neural networks. Proceedings of the IEEE, 87(9): 1423-1447, 
September 1999. 




A Study on Designing Robot Controllers by 
Using Reinforcement Learning with 
Evolutionary State Recruitment Strategy 



Toshiyuki Kondo and Koji Ito 

Dept, of Computational Intelligence and Systems Science, 
Interdisciplinary Graduate School of Science and Engineering, 
Tokyo Institute of Technology 
4259 Nagatsuta, Midori-ku, Yokohama 226-8502, JAPAN, 
konSdis . titech. ac.jp, 
http : //www . ito . dis . titech. ac . jp/~kon/ 



Abstract. Recently, much attention has been focused on utilizing rein- 
forcement learning (RL) for designing robot controllers. However, as the 
state spaces of these robots become continuous and high dimensional, it 
results in time-consuming process. In order to adopt the RL for design- 
ing the controllers of such complicated systems, not only adaptability but 
also computational efficiencies should be taken into account. In this pa- 
per, we introduce an adaptive state recruitment strategy which enables 
a learning robot to rearrange its state space conveniently according to 
the task complexity and the progress of the learning. 



1 Introduction 

In recent years, a number of intelligent robots which can demonstrate outstand- 
ing environment recognition and/or highly complicated motions have been de- 
veloped. However most of controllers for these robots are strictly designed by 
skilled engineers who know well about the environments the robots will be situ- 
ated in. As it may be easily imagined, these robots could show brittleness under 
unexpected environmental changes, because it is impossible for human design- 
ers to figure out every possible situations in advance. Therefore, in the research 
fields of artificial life and robotics, great interests have been paid on emergent 
adaptation methods like reinforcement learning (RL), evolutionary computation 
(EC) and other constitutive methods for designing controllers. 

Among them, evolutionary robotics (ER) [15] where robot controllers are de- 
signed by using the EC techniques (e.g. genetic algorithms, evolution strategies, 
... ) seems to be promising, since it requires only performance measurement dur- 
ing a fixed period (e.g. lifetime) instead of immediate evaluations, or supervised 
signals. However assuming that the EC process is executed on a real robot, it 
would take incredible long time to evaluate multiple candidates/individuals se- 
rially. Due to this, in the actual implementations of the ER, the best solution 
obtained in computer simulations was transferred to the real robot. However as 
suggested in Eggenberger et al.[5], this “transfer approach” suffers the problem 
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known as “gap between simulation and reality” , because it is essentially hard to 
simulate the actual dynamics of the real world[3,5,ll]. For that reason, a hybrid 
approach where the EC was modified to be able to deal with an on-line learning 
(e.g. hebbian learning) has been proposed and showed remarkable performance 
with regard to designing real robot controllers [5, 13, 15]. The robot with the on- 
line learning ability can be expected to demonstrate robustness in real worlds, 
since it can autonomously rearrange its sensorimotor coordination according to 
situated environments. 

Additionally, in the last decade, much attention has been focused on utilizing 
RL for designing robot controllers[l,2,14,17]. Instead of giving actual supervised 
signals, it requires only appropriateness (i.e. reward/punishment) of individuals’ 
action sequences. By choosing actions which maximize the cumulative reinforce- 
ment signals, the robots can learn to behave more appropriately. Although the 
RL could be promising approach for designing real robot controllers without 
human intervention, there still exist some difficulties. One of them is known as 
“curse of dimensionality problem.” As the state space for a learning system (e.g. 
robot) becomes high dimension, the learning process results in time-consuming 
work. One of the effective strategies to overcome the problem is adopting hierar- 
chical RL[14] where the state space is appropriately structured in a hierarchical 
fashion. So as to realize a good hierarchization, however, human designers have 
to be able to choose dominant sensors/ actuators by using preceding knowledge. 
This implies that he/she needs to know the details of the surrounding environ- 
ment a priori. 

Moreover, in order to endow the controllers with high affinities regarding the 
real environments, the sensorimotor mapping should be represented as contin- 
uous functions. However, because of computational limitations, discrete repre- 
sentation have generally been used in most real world RL applications. As a 
consequence, to adopt RL for complicated real robot systems, not only “adapt- 
ability” but also “computational efficiencies” should be taken into account. So 
as to reduce the computational costs for learning (e.g. processing time) and rep- 
resent continuous states efficiently, multilayer perceptrons (MLPs), radial basis 
function (RBF) networks, normalized Gaussian network (NGnet)[12j, and other 
function approximation methods are employed as the sensorimotor mapping rep- 
resentation in a great deal of RL literature[14,17,18j. 

Especially in the case where a Gaussian function (e.g. RBF) is used as the 
basis, thanks to its locality, incremental learning ability of the network becomes 
superior to the case sigmoid function is adopted. In addition, the NGnet re- 
quires smaller number of basis functions compared with the RBF networks 
(i.e. the NGnet uses less computational resources), because of its extrapolation 
property[17,18]. 

As has been already noted, RL with a certain type of neural function approx- 
imator is useful for representing robot controllers, however, the question “how 
to determine the parameters of the basis functions (i.e. center/ width of each 
RBF)?”, or “how many number of the basis units should be used for approxi- 
mating unknown functions?” is still open question. Although these parameters 
(hereafter model parameters) are essential for the learning ability of the net- 
work, in a sense, they have been highly depending on the human designers. A 
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fine partitioning of the state space makes it time-consuming because of a lot 
of basis functions, while a coarse one might bring about inadequate state divi- 
sion. So these model parameters must be adjusted on-line according to the task 
complexity and progress of the learning. 

In the literature of pattern classification, there are several researches[6,16,17, 
18,20] dealing with the above problem known as “resource allocation problem.” 
Adaptive basis addition (ABA) method in which a RBF unit with predetermined 
size is incrementally inserted according to need, has been widely used. Platt [16] 
proposed resource allocating network (RAN) which enables itself to adjust an 
appropriate width of a newly incorporating RBF unit according to the distance 
to the nearest neighbor. In growing Neural Gas proposed by Fritzke[6], self- 
organizing map (SOM) [9] and competitive Hebbian learning were combined, a 
new basis unit will be inserted into the place near the unit which has the most 
accumulated error, and the number of the RBF is changed (usually increased) 
during the growing process. In recent years, Sato et al. introduced the usage of 
on-line EM algorithms for solving the problems in RL literature[18,21]. Moreover, 
Samejima proposed adaptive basis division (ABD)[17] in which the entire state 
space is gradually divided in accordance with approximation errors. 

Although these approaches have their own advantages, the followings should 
be taken into account for applying them for RL. (1) Some of them require 
high computational costs for estimating the validities of the divided states. 

(2) Once the state discretization has been terminated, it is difficult to rear- 
range/reconstruct the partition. Thus on-line classification algorithms should 
have a solution with regard to the tradeoff between exploration and exploitation. 

(3) Most of these methods has been applied only for relatively low dimensional 
tasks in computer simulations. 

Based on the above considerations, in the paper we propose an evolutionary 
state recruitment strategy for RL. The strategy enables a learning system to 
allocate appropriate number and size of RBF units according to task complexity 
and progress of learning. The detail representation of the proposed recruitment 
strategy is explained in the following section. In section 3, the method is applied 
for designing a peg pushing robot controller in a simulation environment. And 
section 4 summarizes the contents of the paper and discusses the future prospects 
of the study. 



2 Reinforcement Learning with Adaptive State 
Recrnitment 

2.1 Actor Critic Reinforcement Learning with NGnet 

As has been noted, in a number of RL applications, neural function approxi- 
mators like MLPs, RBF networks and etc. have generally been utilized as sen- 
sorimotor mappings (i.e. controllers) and/or a state value function. Especially, 
normalized Gaussian network (NGnet) requires less basis functions compared 
with the RBF networks, thanks to its extrapolation property[17,18]. Thus, the 
NGnet shown in Fig.l is adopted for approximating the functions. 
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Fig. 1. Structure of NGnet. 



The learning system assumed in the paper is based on an actor-critic RL[2]. 
In the learning architecture, the actor corresponds to a controller of a system 
(i.e. a mapping from sensor inputs Xt G 5?^ to motor outputs m{xt) G 5?^, M 
and N correspond to the dimension of inputs and outputs). On the other hand, 
the critic is an estimator of a cumulative state value V{xt) G defined as. 



V{xt) = E 



_ fc =0 



( 1 ) 



where r± is a reinforcement signal (i.e. reward/punishment) and 7 correspond to 
the discount factor. 

As the actor and the critic networks are represented by the NGnet, they can 
approximate arbitrary continuous functions. In the paper, we assumed that both 
networks use the same hidden units as shown in Fig. I. 

The output of the actor network m{xt) and the critic network V{xt) are 
calculated as 



ai{f) = exp 






(2) 



Ei = diag(crfi , cr| , • • • cr^- , • • • , Ct^m) > 



( 3 ) 



h{t) 



aijt) 



( 4 ) 
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nikixt) = mr^ ■ f 



Nb 

'^Wkjbjit) + Pnk{t) 
j=i 






( 5 ) 



Nb 

vi^t) =Y^Vjbj{t), (6) 

i=i 

where ai, fii, Si are the output, mean vector and variance-covariance matrix 
of the ith basis function, respectively. T indicates the transpose of a matrix. 
While bi is the output of the ith hidden unit, Nb represents the number of 
RBF units in the network, w^j and Vj denote weight parameters. In order to 
add an exploration factor, a noise term Uk which has the normal distribution is 
incorporated in the equation (5). Furthermore, the function /[•] is a squashing 
function (i.e. sigmoid function) so as to preserve the output range within [—1,1]. 
jjfnax indicates the maximum motor output. 



2.2 Weights Modification Based on TD Learning 

The weight parameters Wkj and Vj are trained based on TD learning [2]. 

St = rt+i + jV{xt+i) -V{xt), (7) 



Awkj = r]Abj{xt)Stnk, ( 8 ) 

Avj = T]cbj{xt)St, (9) 

where 6t is called the Temporal Difference (TD) error, and tja, rjc are the learning 
rate. Since the TD error 5t represents the relative reward of taking action m{xt) 
at the state Xf, the weight parameters of the actor network can be modified 
according to the equation (8). In contrast, the parameters of the critic network 
can be trained based on the equation (9), so as to reduce the error corresponds 
to the difference between the estimated state values. 



2.3 Evolutionary State Recruitment Strategy 

On the other hand, model parameters (here, pit, Si and Nb of NGnet) which are 
essential for learning process had to be carefully defined by human designers, or 
adjusted based on statistics of learning data collected off-line. However in an on- 
line learning, these parameters estimated in early stage of the learning might be 
unreliable, because these data would be obtained through wrong sensorimotor 
coordination. Thus, they should be modified with the progress of learning. 

In [17], the online state division is realized based on calculating eigen vector 
of the variance-covariance matrix of the most weakest unit. On the other in 
[18], this realized by on-line EM algorithms. To realize a computationally cheap 
implementation for modifying the model parameters, in the paper we propose an 
evolutionary state recruitment strategy schematically shown in Fig. 2. 
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Fig. 2. Reinforcement learning with evolutionary state recruitment strategy. 



As shown in the figure, sensor inputs Xt are localized by some of the radial ba- 
sis functions which model parameters are adjusted based on the proposed recruit- 
ment strategy explained later. After the localization, states Si{i = 1,2 ,...Nb) 
are mapped into actuator outputs rrit and the state value V{xt) with weighted 
sum. 

In the following, how to develop resource allocation and their parameters in 
the proposed architecture is explained. The strategy proposed here consists of 
the four procedures; (1) Insertion, (2) Evaluation, (3) Selection and (4) Muta- 
tion/Deletion. One of the most essential strategies usually adopted to reduce 
the costs is adding basis units with regard to a newly encountered state where 
no unit had been located [17, 18] like the ABA. In this study, if there is no unit 
satisfying > Omm and if the current TD error 5t > Smin is satisfied, a new 
RBF unit which center vector /j, is set to the current sensory inputs will be 
incremented. 

However the problem how to determine the variance-covariance matrix S has 
to be solved. In order to determine these parameters appropriately along with 
the learning process, the following state recruitment strategy is introduced. As 
the NGnet is sometimes called mixture of experts approach[12,18], these type of 
networks can be considered as an ensemble of many local functions. Thus, in the 
context of the EC, it can be regarded as classifier systems[7,19]. In the study, 
we adopt the concept of classifier systems so as to evaluate each local rule (the 
RBF unit). 

To implement the EC methodology, it is assumed that each RBF unit has 
the following parameters, Li and Ci, for evaluating its utility in the recruitment 
process. Here, Li is a lifetime of the unit f, and indicates its evaluation period. 
Is is updated at every time step according to the following equation. 



ALi{t) = -Tbi{xt), 



( 10 ) 
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where r is a dissipation coefficient. Li is set to 1.0 when the unit i is created, 
and the unit can be evaluated while the parameter Li is positive. For the ease of 
explanation, we call the unit which L value is negative as “evaluated. ” The pa- 
rameter ensures that a newborn unit can not take part in the following competing 
process until the unit enough to be “evaluated” . 

On the other hand, so as to measure the appropriateness of each RBF unit, a 
fitness parameter Ci is defined. The parameter represents the contribution rate 
of the unit i to the time averaged variance of TD error 6t- Ci is calculated as 

C,{t) = \cC,{t - 1) + (1 - \c)h{xt)5t\ (11) 



where 



Xc = I - bi{xt)X. ( 12 ) 

In order to update Ci only when the unit is activated, a decay rate Ac is 
defined[17]. This criteria intuitively indicates that a unit which realizes the more 
appropriate state division must have the less Ci value, because the parameter 
Ci should be in proportion to the variance of weight update (Awij). 

Based on the above two parameters, following recruitment strategy is exe- 
cuted. In each step, if there is no unit to approximate the current input region 
(i.e. Qi < amin for all i), a new RBF unit in which its center vector /x is set 
to the current inputs is inserted. On the contrary, if there is only one unit, the 
recruitment process does nothing, since the unit can be regarded as local expert. 
Furthermore if there are more than two units which satisfy “active”^ and ’’eval- 
uated” conditions — these units are called “competing units” — , two of them are 
randomly selected from these units. 

For the ease of explanation, now assuming that the unit P\ and P 2 (here 
Cpj < Opj) are randomly chosen. If the unit which has smaller C value (i.e. Pi) 
is not enough trained (namely Cp^ > Cmin), the unit Pi is duplicated and the 
clone unit is mutated (P^) according to the equation (13). On the other hand, 
the individual which has larger C value (P 2 ) is deleted. 



O' i (j i ■ [1 rj^N{0, Cp2 — C*pj)] , (13) 

where N(Q,Cp 2 — Cpi) is a random noise generator based on a normal distri- 
bution. 



3 Peg Pushing Robot Control 

3.1 Task Settings 

The proposed evolutionary recruitment strategy has been applied to controller 
design of a peg pushing robot. 

^ The term “active” is used to represent the RBF unit which current ai value is larger 
than a threshold Uact 
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Fig. 3. Configuration for peg pushing task. 



The task of peg pushing is schematically illustrated in Fig. 3. As shown in 
the figure, there are three objects, a light source (hereafter “Light”), a cylindri- 
cal object (“Peg”) and a mobile robot (“Robot”). The aim of the Robot is to 
push the Peg toward the Light as fast as possible. For simplicity, it is assumed 
that initial positions of these three objects are predetermined. However at the 
beginning of each trial, the initial heading direction of the Robot (Or) is chosen 
in random within [— tt, tt]. 

Accordingly, the sensorimotor mapping (i.e. controller) of the Robot consists 
of four continuous inputs {OpR, dpR, OpR, dpR) and two outputs {vp and vpi). 
Here, Orr and Opp represent the angle between Peg/Light and Robot ([— 7r,7r]), 
dpR and dpR indicate the distance between Peg/Light and Robot (These are 
normalized in [0, 1] by using a squashing function.), vp and vr are left and right 
motor velocities, respectively. 



3.2 Evaluation Criteria 

In this simulation, the behavior of the Robot is evaluated based on the following 
criteria /i and / 2 , thus reward rt is calculated as 



rt = fi- / 2 , 



(14) 



dp pit) 
^imt J 



fi = exp 



— tv • 



(15) 
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r 1.0 • • • if dpR{t) < RRobot + Rpegi 
0.0 • • • otherwise, 



(16) 

where dmit is the initial distance between Robot and Light (i.e. dmit = dLfl(O)). 
R Robot and Rpeg denote the radius of Robot and Peg, respectively, k is a positive 
constant. The first criterion (/I) stands for measuring the task performance, and 
the second one (/2) prompts the Robot to keep contact with the Peg at any time. 
The product of these two measurements is given as the reinforcement signal at 
each time step. The parameters assigned in this experiment are listed in Table. 1. 



Table 1. Parameters used in the peg pushing task. 



13: 


noise ratio 


1.0 


r- 


discount factor 


0.9 


r)A: 


learning rate for Actor 


0.1 


nc: 


learning rate for Critic 


0.01 


V<T-. 


learning rate for radius of each RBF unit 


0.05 


f^min • 


threshold for RBF addition 


0.4 


^min • 


threshold for allowable TD error 


0.01 


Cmin • 


threshold for evaluation of the individual 


0.005 


L: t: 


dissipation factor for lifetime 


0.01 


A: 


decay rate for fitness time-averaging 


0.9 


^act • 


threshold for being competing unit 


0.1 




decay rate for reward landscape 


0.5 



3.3 Results 

To confirm the advantage of the proposed method, it was compared with the 
conventional methods, in each of which a fixed size of RBF {S = cr^ J; a = 0.2, 0.3 
and 0.4. Here I indicates identity matrix) is incrementally added with regard to 
unexperienced (i.e. any RBF unit had not been located) sensory space. 

Fig. 4 shows the comparison of the averaged rewards and their error bars in 
those cases. From the figure, it can be seen that (1) the smaller state division 
(cr = 0.2) achieves the better performance, and (2) the error bars are enlarged 
as the size of RBF becomes wide. However, as can be seen in Fig. 5, the smaller 
division results in wasteful use of RBF units. On the contrary, we can see that the 
averaged reward of the proposed evolutionary recruitment strategy is even with 
the smallest fixed case (i.e. cr = 0.2) in spite of requiring less RBF units. Due 
to this, the proposed method requires less computational resources (e.g. CPU 
time) than the conventional one, although it consumes the additional process for 
the recruitment strategy. 
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Fig. 4. Comparison of the averaged rewards. 




Fig. 5. Comparison of the number of RBF units. 



Moreover, as the error bar of the proposed method is narrower than the 
smallest case, it can be experimentally concluded that the proposed method 
makes the learning process more stable. In addition, it can be considered that the 
proposed method realized a parameter free adaptation with respect to the number 
of RBF, since designers do not have to care about the variance parameters in 
advance. 

A typical trajectory of the Robot during a trial (30sec) and 2D projection 
(i.e. contour lines) of the critic {V{6ph,9lr), V{9pR,dLR)) and the actor out- 
puts {vr{9pp,9pp), vi{9pp,9pp)) are displayed in Fig. 6. In this projection, the 
brighter region represents the higher altitude area, and vice versa. Therefore, it 
can be seen that (1) V{9pr,9lr) becomes high when the Robot is located in 
front of the both of the Peg and the Light, and (2) V {9rr, dpR) results in higher 
as the Robot approaches the Light. As the mobile robot assumed in this Simula- 
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Fig. 6. Trajectories of the Robot. 



tion has the non-holonomic characteristics, there is a singular posture (e.g. the 
Robot can not push the Peg at \9pr\ = 7r/2[rad]). Although there is no external 
reward for avoiding the singularity, it is confirmed that the Robot can correctly 
learn its value function. 

Furthermore, Fig. 7 illustrates the resultant trajectories from eight different 
initial heading directions. From these figures, we can see that the Robot can 
successfully push the Peg toward the Light from any initial posture. For ex- 
ample, if the initial heading is the case shown in Fig. 7(e), the Robot starts to 
move backward, and it pushes the Peg toward relatively left-rear direction. In 
consequence, as the positions of these objects result in the same posture as in 
Fig. 7(b), so it can push the Peg similarly. It can be considered that the continu- 
ous representation of the sensorimotor mapping implemented by NGnet enables 
the system to have the robustness with respect to the initial heading directions. 

In addition. Fig. 8 represents snapshots of peg-pushing robots in a physical 
simulator. The left hand side figures show the resultant trajectories on the floor 
with high coefficient of dynamic friction, on the other the right hand side figures 
are the cases on the low friction floor. 

4 Conclusions 

In the paper, we have proposed an evolutionary state recruitment strategy for 
continuous and high dimensional function approximation. To apply the proposed 
method for designing robot controllers, we have proposed a combinational use of 
actor-critic RL and the method. According to results of the peg pushing robot 
simulation, it has been concluded that the advantage of adopting the method are; 
(1) designers of the systems do not have to predetermine the model parameters 
(i.e. in the paper, center vector and variance of RBF), and (2) the number of 
RBF units used in the obtained network is adjusted to be some few numbers 
regarding to the task complexity and the learning process. 
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(a) heading direction 0 [deg] 



(b) heading direction 45 [deg] 




(c) heading direction 90 [deg] 



(d) heading direction 135 [deg] 




(e) heading direction 180[deg] 



(f) heading direction 225 [deg] 




(g) heading direction 270 [deg] 



(h) heading direction 315 [deg] 



Fig. 7. Robustness with respect to heading directions. 
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(a) High friction coefficient 



{b) Low friction coefficient 



Fig. 8. DifTerence against friction coefficient parameters. 
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Abstract. Simple linear readouts from generic neural microcircuit mod- 
els can be trained to generate and control basic movements, e.g., reach- 
ing with an arm to various target points. After suitable training of these 
readouts on a small number of target points; reaching movements to 
nearby points can also be generated. Sensory or proprioceptive feed- 
back turns out to improve the performance of the neural microcircuit 
model, if it arrives with a significant delay of 25 to 100 ms. Further- 
more, additional feedbacks of “prediction of sensory variables” are shown 
to improve the performance significantly. Existing control methods in 
robotics that take the particular dynamics of sensors and actuators into 
account( “embodiment of robot control”) are taken one step further with 
this approach which provides methods for also using the “embodiment of 
computation”, i.e. the inherent dynamics and spatial structure of neural 
circuits, for the design of robot movement controllers. 



1 Introduction 

This article demonstrates that simple linear readouts from generic neural mi- 
crocircuit models consisting of spiking neurons and dynamic synapses can be 
trained to generate and control rather complex movements. Using biologically 
realistic neural circuit models to generate and control movements is not so easy, 
since these models are made of spiking neurons and dynamic synapses which 
exhibit a rich inherent dynamics on several temporal scales. This tends to be in 
conflict with movement control tasks that require focusing on a relatively slow 
time scale. 

Preceding work on movement control, has drawn attention to the need of 
taking the “embodiment of motor systems” , i.e. the inherent dynamics of sensors 
and actuators into account. This approach is taken one step further in this 
article, as it provides a method for also taking into account the “embodiment 
of neural computation”, i.e. the inherent dynamics and spatial arrangement of 
neural circuits that control the movements. Hence it may be seen as a first step in 

* The work was partially supported by the Austrian Science Fond FWF, project # 
P15386. 
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a long range program where abstract control principles for biological movement 
control and related models developed for artificial neural networks [Tani, 2003] 
can be implemented and tested on arbitrarily realistic models for the underlying 
neural circuitry. 

The feasibility of our approach is demonstrated in this article by showing 
that simple linear readouts from a generic neural microcircuit model can be 
trained to control a 2-joint robot arm, which is a common benchmark task 
for testing methods for nonlinear control [Slotine and Li, 1991]. It turns out 
that both the spatial organization of information streams, especially the spatial 
encoding of slowly varying input variables, and the inherent dynamics of the 
generic neural microcircuit model have a significant impact on its capability 
to control movements. In particular it is shown that the inherent dynamics of 
neural microcircuits allows these circuits to cope with rather large delays for 
proprioceptive and sensory feedback. In fact it turns out that their performance 
is optimal for delays that lie in the range of 25 to 100 ms. Additionally it is 
shown that the generic neural microcircuit models used by us, possess significant 
amount of temporal integration capabilities. It is also demonstrated that this new 
paradigm of motor control provides generalization capabilities to the readouts. 
Furthermore, it is shown that the same neural microcircuit model can be trained 
simultaneously to predict the results of such feedbacks, and by using the results 
of these predicted feedbacks it can improve its performance significantly in cases 
where feedback arrives with other delays, or not at all. 

This work complements preceding work where generic neural microcir- 
cuit models were used in an open loop for a variety of simulated sen- 
sory processing tasks ([Buonomano and Merzenich, 1995], [Maass et ah, 2002], 
[Maass et ah, 2003]). It turns out that the demands on the precision of real- 
time computations carried out by such circuit models are substantially higher 
for closed-loop applications such as those considered in this article. Somewhat 
similar paradigms for neural control based on artificial neural network models 
have been independently explored by Herbert Jaeger [Jager, 2002]. 

The neural microcircuit model and the control tasks considered in this article 
are specified in the subsequent two sections. Results of computer simulations are 
presented in sections 4, 5, 6 and relations to theoretical results are discussed in 
section 7. 

2 Generic Neural Microcircuit Models 

In contrast to common artificial neural network models, neural microcircuits 
in biological organisms consist of diverse components such as different types of 
spiking neurons and dynamic synapses, that are each endowed with an inher- 
ently complex dynamics of its own. This makes it difficult to construct out of 
biologically realistic computational units, implementations of boolean or analog 
circuits that have turned out to be useful in the context of computer science or 
artificial neural networks. On the other hand, it opens the path towards alterna- 
tive computational paradigms based on emergent computations in sparsely and 
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recurrently connected neural microcircuits, composed of diverse dynamic com- 
ponents [Maass et al., 2002]. In [Maass et al., 2002] a new computational model, 
the liquid state machine, has been proposed that can be used to explain and an- 
alyze the capabilities of such neural microcircuits for real-time computing. Con- 
sequences of this analysis for applications to closed loop control are discussed 
in section 7 of this article. Instead of constructing circuits for specific tasks, 
one considers here various probability distributions for neural connectivity. Such 
circuits have inherent capabilities for temporal integration of information from 
several segments of incoming input streams, and relevant computations that re- 
combine pieces of this information in a nonlinear manner emerge automatically. 
To be exact, the information about the outputs of a very large class of compu- 
tations on information contained in the input stream is automatically present in 
the “liquid state” of the dynamical system in the sense of [Maass et al., 2002], 
see [Natschlager and Maass, 2004]. 

The liquid state x(t) models that part of the current circuit state that is 
in principle “visible” to a readout neuron (see Fig. 1, a) that receives synaptic 
inputs from all neurons in the circuit. Each component of x(t) models the impact 
that a particular neuron v may have on the membrane potential of a generic 
readout neuron (see Fig. 2). Thus each spike of neuron v is replaced by a pulse 
whose amplitude decays exponentially with a time constant of 30 ms. In other 
words: x(t) is obtained by applying a low-pass filter to the spike trains emitted 
by the neurons in the generic neural microcircuit model. We will only consider 
information that can be extracted from the liquid state x(t) of the generic neural 
microcircuit model by a simple weighted sum^ w x x(t). The weight vector w 
will be fixed for all t and all circuit inputs once the training of the (symbolic) 
readout neurons has been completed. 

In principle one can of course also view various parameters within the circuit 
as being subject to learning or adaptation, for example in order to optimize the 
dynamics of the circuit for a particular range of control tasks. However this has 
turned out to be not necessary for the applications described in this article. One 
advantage of just viewing the weight vector w as being plastic is that learning 
is quite simple and robust, since it just amounts to linear regression - in spite of 
the highly nonlinear nature of the control tasks to which this set-up is applied. 
Another advantage is that the same neural microcircuit could potentially be 
used for various other information processing tasks (e.g. prediction of sensory 
feedback, see section 6) that may be desirable for the same or other tasks. 

The generic microcircuit models used for the closed loop control tasks de- 
scribed in this article were similar in structure to those that were earlier used 
for various sensory processing tasks. More precisely, we considered circuits con- 
sisting of 600 leaky-integrate-and-fire neurons arranged on the grid points of a 
20 X 5 X 6 cube in 3D (see Fig. 1, b). 20 % of these neurons were randomly cho- 
sen to be inhibitory. Synaptic connections were chosen according to a probability 



^ One constant component is added to x(t) to facilitate the implementation of a con- 
stant bias in terms of the form w x x(t). 
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Fig. 1. a) Information flow diagram for a neural microcircuit model applied to a control 
task. The “plant” may be a motor system or some part of the environment, b) Spatial 
layout of neurons in the models considered in this article. The 6 layers on the left hand 
side are used for spatial coding of inputs to the circuit {xdest,ydest,9i{t —A), 62 {t — 
A) , Ti{t) , T 2 {t)) . Connections between these 6 input layers, as well as between neurons 
in the subsequent 6 processing layers are chosen randomly according to a probability 
distribution discussed in the text, c) Standard model of a 2-joint robot arm d) Initial 
position A and end position B of the robot arm for one of the movements. The target 
trajectory of the tip of the robot arm and of the elbow are indicated by dashed lines. 




Fig. 2. Snapshots of the liquid state of a neural microcircuit model at 3 different time 
points. The circuit has 600 neurons which are shown in a 30 x 20 grid. Snapshots are 
taken at 100, 200 and 300 ms. 
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distribution that favored local connections^ . Parameters of neurons and synapses 
were chosen to fit data from microcircuits in rat somatosensory cortex (based on 
[Gupta et ah, 2000] and [Markram et ah, 1998]).^ We modeled the (short term) 
dynamics of synapses according to the model proposed in [Markram et ah, 1998], 
with the synaptic parameters U (use), D (time constant for depression), F 
(time constant for facilitation) randomly chosen from Gaussian distributions 
that model empirically found data for such connections.^ 

In order to test the noise robustness of movement generation by the neural 
microcircuit model the initial condition of the circuit was randomly drawn (initial 
membrane potential for each neuron drawn uniformly from the interval [13.5 
mV, 14.9 mV], where 15 mV was the firing threshold). In addition a substantial 
amount of noise was added to the input current of each neuron throughout the 
simulation at each time-step, a new value for the noise input current with mean 
0 and SD of 1 nA was drawn for each neuron and added (subtracted) to its input 
current. 

This generic neural microcircuit model received analog input streams from 
6 sources (from 8 sources in the experiment with internal predictions discussed 
in Fig. 6, b and 7). The outcomes of the experiments discussed in this article 

^ The probability of a synaptic connection from neuron a to neuron b (as well 
as that of a synaptic connection from neuron b to neuron a) was defined as 
C ■ exp{—D^{a,b)/\^), where D{a,b) is the Euclidean distance between neurons 
a and b and A is a parameter which controls both the average number of connections 
and the average distance between neurons that are synaptically connected (we set 
A = 1.2). Depending on whether the pre- or postsynaptic neuron were excitatory 
(E) or inhibitory (7), the value of C was set according to [Gupta et ah, 2000] to 0.3 
\eE), 0.2 (A7), 0.4 (7A), 0.1 (77). 

® Neuron parameters: membrane time constant 30 ms, absolute refractory period 3 ms 
(excitatory neurons), 2 ms (inhibitory neurons), threshold 15 mV (for a resting mem- 
brane potential assumed to be 0), reset voltage drawn uniformly from the interval 
[13.8, 14.5 mV] for each neuron, constant non-specific background current It, uni- 
formly drawn from the interval [13.5 nA, 14.5 nA] for each neuron, noise at each 
time-step Inoise drawn from a gaussian distribution with mean 0 and SD of InA, in- 
put resistance 1 Ml7. For each simulation, the initial conditions of each I&F neuron, 
i.e., the membrane voltage at time t = 0, were drawn randomly (uniform distribu- 
tion) from the interval [13.5 mV, 14.9 mV]. 

^ Depending on whether a and b were excitatory (E) or inhibitory (7), the mean values 
of these three parameters (with D,E expressed in seconds, s) were chosen according 
to [Gupta et ah, 2000] to be .5, 1.1, .05 {EE), .05, .125, 1.2 (El), .25, .7, .02 (IE), 
.32, .144, .06 (77). The SD of each parameter was chosen to be 50% of its mean. The 
mean of the scaling parameter A (in nA) was chosen to be 70 (EE), 150 (El), -47 
(IE), -47 (II). In the case of input synapses the parameter A had a value of 70 nA 
if projecting onto a excitatory neuron and -47 nA if projecting onto an inhibitory 
neuron. The SD of the A parameter was chosen to be 70% of its mean and was drawn 
from a gamma distribution. The postsynaptic current was modeled as an exponential 
decay exp(— t/rs) with Ts = 3 ms {ts = 6 ms) for excitatory (inhibitory) synapses. 
The transmission delays between liquid neurons were chosen uniformly to be 1.5 ms 
{EE), and 0.8ms for the other connections. 
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were all negative if these analog input streams were directly fed into the circuit 
(as input current for selected neurons in the circuit). Apparently the variance 
of the resulting spike trains were too large to make the information about the 
slowly varying values of these input streams readily accessible to the circuit. 
Therefore the input streams were instead fed into the circuit with a simple form 
of spatial coding, where the location of neurons that were activated encoded 
the current values of the input variables. More precisely, each input variable is 
first scaled into the range [0, 1]. This range is linearly mapped onto an array of 
50 symbolic input neurons. At each time step, one of these 50 neurons, whose 
number n{t) G {1, ■ . . , 50} reflects the current value in{t) G [0, 1] which is the 
normalized value of input variable i{t) (e.g. n{t) = 1 if i„(t) = 0, n{t) = 50 if 
in{t) = 1). The neuron n(t) then outputs at time step t, the value of i(t). In 
addition the 3 closest neighbors on both sides of neuron n(t) in this linear array 
get activated at time t by a scaled down amount according to a gaussian function 

_(n — n(t))^ 

(the neuron number n outputs at time step t the value i{t) e ^ , 

where a = 0.8). Thus the value of each of the 6 input variables is encoded at any 
time by the output values of the associated 50 symbolic input neurons (of which 
at least 43 neurons output at any time the value 0). The neuron in each of these 
6 linear arrays are connected with one of the 6 layers consisting of 100 neurons 
in the previously described circuit of 100 x 6 ((20 x 5) x 6) integrate-and-flre 
neurons.® The spatial arrangement of the 6 input pools can be seen on the left 
of the 6 layers of the circuit of spiking neurons in Fig. 1, b. 

3 A 2-Joint Robot Arm as a Benchmark Nonlinear 
Control Task 

We used the generic neural microcircuit models in a closed loop (see Fig. 1, a and 
Fig. 3) as controllers for a 2-joint robot arm. More precisely, we trained them to 
control exactly the same model for a 2-joint robot arm (see Fig. 1, c) that is used 
in [Slotine and Li, 1991] as a standard reference model for a complex nonlinear 
control task (see in particular ch. 6 and 9). It is assumed that the arm is moving 
in a horizontal plane, so that gravitational forces can be ignored. 

Using the well-known Lagrangian equation in classical dynamics, the dynamic 
equations for our arm model are given by equation 1: 

Hii Hi 2 01 —h92—h{9i+02) 01 _ Ti , , 

H 21 H 22 02 h9i 0 02 

with 6 = [01 02]’^ being the two joint angles, t = [ti T 2 ]^ being the joint input 
torques, and 

Hii = niilci'^ -|- Fi -l- Tn 2 [li^ + lc 2 ^ T 2 l\lc 2 cos ^ 2 ] T I 2 

® with a value of 3.3 for A in the formula for the connection probability given in 
footnote 2. 
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Fig. 3. During training the neural circuit receives the values 6i{t — A),02{t — 
A) , Ti{t) , T 2 {t) as additional external inputs. During validation the circuit is run in 
a closed loop with the arm model, and these 4 external inputs are replaced by internal 
feedbacks as indicated. The only external inputs during validation are the constant 
inputs Xdest,ydeat that give in cartesian coordinates, the target position of the tip of 
the robot arm after the movement has been completed. All the dynamics needed to 
generate the movement is then provided by the inherent dynamics of the neural cir- 
cuit in response to the switching on of these constant inputs (and in response to the 
dynamics of the feedbacks). 



H\2 — H 21 — TO2^1^c2 cos 02 + W2^C2^ ^2 

H22 = TO2^C2^ + I2 
h = m2hlc2 sin 02 

Equation 1 can be compactly represented as: 

H{9)6 + c{e,e)e = T 

where H represents the inertia matrix, and C represents the matrix of Corio- 
lis and centripetal terms. The values of the parameters that were used in our 
simulations were: toi = l,m2 = l,lci = 0.25, ^C2 = 0.25, Ii = 0.03, /2 = 0.03. 

The closed loop control system that was used to generate the results discussed 
in this article is shown in Fig. 3. During training of the readouts from the generic 
neural microcircuit model the circuit was used in an open loop with target values 
for the output torques provided by equation 1 (for a given target trajectory 
{01 (t), 02(f)}), and feedbacks from the plant replaced by the target values of 
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these feedbacks for the target trajectory. More precisely, for each such target 
trajectory 20 variations of the training samples were generated, where for each 
time step t®, a different noise value of 0.01 x p was added to each of the input 
channels where p is a random number drawn from a gaussian distribution with 
mean 0 and SD 1, multiplied by the current value of input channel. 

The purpose of this extended training procedure was to make the readout 
robust with regard to deviation from the target trajectory caused by imprecision 
in earlier torque outputs (see section 7). Each target trajectory had a time 
duration of 500 ms. 



4 Teaching a Generic Neural Microcircuit Model to 
Generate Basic Movements 

As a first task the generic neural microcircuit model described in section 2 was 
taught to generate with the 2-joint arm described in section 3 , 4 separate move- 
ments. One of these movements is shown in Fig. 1, d. In each case the task was 
to move the tip of the arm from point A to point B on & straight line, with 
a biologically realistic bell-shaped velocity profile. The 2 readouts of the neural 
microcircuit model are trained by linear regression to output the joint torques 
required for movement. More precisely, 20 noisy variations of each of the 4 target 
movements were used for the training of the two readouts by linear regression. 
Note that each readout is simply modeled as a linear gate with weight vector w 
applied to the liquid state x(t) of the neural circuit. This weight vector is fixed 
after training, and during validation all 4 movements are generated with this 
fixed weight vector at the readouts. 

When the circuit receives as input the coordinates {xdestiVdest) of the end- 
point B of one of the target movements, the circuit autonomously generates in a 
closed loop the torques needed to move the 2-joint arm from the corresponding 
initial point A to this endpoint B. 

Obviously temporal integration capabilities of the controller are needed for 
the control of many types of movements. Generic neural microcircuit models 
have inherent temporal integration capabilities (see [Maass et ah, 2003]), and 
hence can even generate movements that require to stop for a certain period, 
and then to move on autonomously. Fig. 4 shows the results for the case where 
the readouts from the neural microcircuit have been trained to generate a stop- 
and-start kind of motion, with a stop from 225 to 275 ms (see the velocity profile 
at the bottom) . The initiation of the continuation of the movement at time t = 
275 ms takes place without any external cue, just on the basis of the inherent 
temporal integration capability of the trained neural circuit. Average deviation 
of the tip of the robot arm from the target end point over 20 validation runs: 
6.86 cm. For the sake of demonstration purposes we chose for the experiment 
reported in Fig. 4 a feedback delay of just 1 ms, so that all circuit inputs are 



all time steps were chosen to have a length of 2 ms, except for the experiment 
reported in Fig. 4, where a step size of 1 ms was used. 
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constant during 49 ms of the 50 ms while the controller has to wait, forcing the 
readouts to recognize just on the basis of the inherent circuit dynamics when to 
move on. 




Fig. 4. Demonstration of the temporal integration capability of the neural controller. 
Shown is a validation run for a circuit that has been trained to generate a movement 
that requires an intermediate stop and then autonomous continuation of the movement 
after 50 ms. Target trajectories are shown as solid lines, actual trajectories as dashed 
lines. 



5 Generalization Capabilities 

For the experiment reported in Fig. 5, the circuit was trained to generate from 
a common initial position reaching movements to 8 different target positions, 
given in terms of their cartesian coordinates as constant inputs {xdest,ydest) 
to the circuit. After training the circuit was able to generate with fairly high 
precision reaching movements to other target points never used during training. 
The autonomously generated reaching movements moved the robot arm on a 
rather straight line with bell-shaped velocity profile, just as for those reaching 
movements to targets that were used for training. 
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Fig. 5. a) Generalization of movement generation to 5 target end points (small circles) 
which were not among the 8 target end points that occurred during training (small 
squares). Movements to a new target end point was initiated by giving its cartesian 
coordinates as inputs to the circuit. Average deviation for 15 runs with new target end 
points: 10.3 cm (4.8 cm for target end points that occurred during training), b) The 
velocity profile for one of the generalized movements (target - solid, actual - dashed). 



6 On the Role of Feedback and Sensory Prediction for 
Movement Generation 

Our model assumes that the neural circuit receives as inputs in addition to 
efferent copies of its movement commands (i.e. torques ui{t),U 2 {t)), with little 
delay also simulated proprioceptive or visual feedback that provides at time t, 
information about the actual values of the angles of the joints at time t — A. 
Whereas it is quite difficult to construct circuits or other artificial controllers 
that can benefit significantly from substantially delayed feedbacks, we show in 
Fig. 6 a, that generic neural microcircuit models are able to generate and control 
movements even for substantially delayed feedbacks. In fact, Fig. 6 a, shows that 
the best performance is achieved not when this delay A has an value of 0, but 
for a range of delays between 25 and 100 ms. In another experiment which is 
reported in Fig. 6 b, we used a generic neural microcircuit with 800 neurons 
(20 X 5 X 8). This microcircuit had two additional readouts which were trained 
to estimate at time t, the values of the joint angles 6i and 02 at time t - 200 ms. 
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Fig. 6. Deviation of end point of movement averaged over 400 runs (10 randomly 
constructed circuits were used; 40 runs per circuit), for the same 4 movements but with 
21 different values of the delay A for the feedback 9i{t — A),02{t — A). The vertical 
bars show the SD of the data, a) Circuit-size - 600 neurons. No readouts assigned 
to predict the feedbacks. b) Circuit-size - 800 neurons. Two additional readouts have 
been trained to predict 9i{t — 200 ms), 02 {t — 200 ms). The solid line shows the average 
performance when these predictions are not fed back to the circuit. The dashed line 
shows the average performance when these predictions were also supplied to the circuit. 
Note that the SD in the data when these predictions were fed-back to the circuit is 
considerably less than that of the case when no such predictions were availaible. 



The top solid line in Fig. 6 b shows the result (computed in the same way as 
Fig. 6 a) for the case when the information about these estimates was not fed- 
back to the circuit. The bottom dashed line shows the result when these estimates 
were available to the circuit via feedback. Although these additional feedbacks do 
not provide any new information to the circuit, but only condensate and reroute 
(see Fig. 7) information, that was previously spread out all over the circuit 
dynamics; this additional feedback significantly improved the performance of 
the simulated neural microcircuit for all values of A. The values for A = 500 
ms shows the improvement achieved by using predicted sensory feedback in case 
when no feedback arrives at all, since the total movements lasted for 500 ms. 

7 Theoretical Analysis 

Some theoretical background for the analysis of the computational power of 
neural microcircuits in an open loop, such as online speech recognition (see 
[Maass et ah, 2002]), is provided by two mathematical theorems in the appendix 
of [Maass et ah, 2002] (see [Maass and Markram, 2002] for details). The main 
result is that a sufficiently large neural microcircuit model (which contains suf- 
ficiently diverse dynamic components to satisfy the separation property) can in 
principle uniformly approximate any given time-invariant fading memory filter 
F. However these theoretical results do not address the problems caused by inter- 
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Fig. 7. Information flow for the case of autonomously generated estimates 9{t — 200 ms) 
of delayed feedback 6{t — 200 ms). Rest of the circuit as in Fig. 3. 



nal noise in neural microcircuits, which require a more quantitative complexity- 
theoretic analysis that takes into account that differences in liquid states at a 
given time t, which are caused by two different input histories rt(-) and v{-), 
have to be sujficiently large (instead of just yf 0) in order to be distinguish- 
able from state differences caused by internal noise, and it is only meaningful 
to require such sufficiently large separation for input histories u(-), u(-) that are 
significantly different (hence the dynamics of the neural microcircuit has to be 
non-chaotic). Fortunately generic neural microcircuit models tend to have these 
properties (see [Maass et ah, 2002]). 

Additional conditions have to be met for successful applications of neural 
microcircuit models in closed-loop control tasks, such as those considered in this 
article. First of all, one has to assume that the approximation target for the 
neural microcircuit, some successful controller F for the plant (the robot 
arm from Fig. 1 c, is the example of a plant P discussed in this article), is again 
a time-invariant fading memory filter. But without additional constraints on the 
plant and/or target controller F one cannot guarantee that neural microcircuits 
L that uniformly approximate such successful controller F can also successfully 
control the plant P. In this context we say that F can be uniformly approximated 
by neural microcircuit model L if there exists for every e > 0 some neural 
microcircuit model L so that \\{Fu){t) — {Lu){t)\\ < e for all times t and all input 
functions u(-) that may enter the controller. Note that the feedback / from the 
plant has to be subsumed by these functions u(-), so that u(t) is in general of 
the form u(t) = (uo(t), f(t)), where uo(t) are external control commands (both 



^ P should satisfy the well-known bounded input, bounded output (BIBO) criteria in 
control theory. 
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uo{t) and f(t) are in general multi-dimensional).® Assume that such microcircuit 
model L has been chosen for some extremely small e > 0. Nevertheless the plant 
P may magnify the differences < e between outputs from F and outputs from L 
(which may occur even if F and L receive initially the same input u) and produce 
feedback functions /f(s),/l(s) whose difference is fairly large. The difference 
between the outputs of F and L for these different feedbacks may become much 
larger than e, and the outputs of F and L (and of the plant) in this closed loop 
may eventually diverge. This situation does in fact occur in the case of a 2-joint 
arm as plant P. Hence the assumption that L approximates F uniformly within 
£ cannot guarantee that \\{FuF){t) — {LuL){t)\\ < e for all t (where upit) '■= 
{uo{t), fpit)) and Mi(t) := (Mo(t), /i(t))), since even \\{FuF){t)-{FuL){t)\\ may 
already become much larger than e for sufficiently large t. On the other hand if 
one assumes that the target controller F is “contracting” in the neighborhood of 
its intended working range so that ||(T’ui;’)(t) — (FMi)(t)|| stays small if ■Ui^’(-) and 
ul{-) did not differ too much at preceding time steps, one can bound \\{FuF){t) — 
{LuL){t)\\ by \\{FuF)(t)-{FuL){t)\\ + \\{FuL)(t)- {LuL)(t)\\ and thereby avoid 
divergence of the trajectories caused by F and L in the closed-loop system. 

The assumption that the target control filter F is “contracting” in the neigh- 
borhood of its intended working range, i.e. for external controls uo(t) that are 
actually used, was met by the target filters F that the neural microcircuit models 
were trained to approximate for the experiments reported in this article: these 
target filters F, which were used to generate the target values for training the 
readouts of the neural microcircuit models L were assumed to give for small 
variations of their input functions u(-) the same output. Hence they were con- 
tracting in the neighborhood of their intended working range. Through these 
considerations it now becomes clear why it was necessary to train the readouts 
of the neural microcircuit models L not just for single functions UF(t) but also 
for noisy variations of the ideal input function UF{t) = {uo(t), fF{t)) that arise 
from functions fF{t) produced by the plant P in response to the outputs of tar- 
get controller F: without this training procedure it would have been impossible 
to train the neural circuit models to approximate a contracting controller. 

8 Discussion 

Whereas traditional models for neural computation had focused on con- 
structions of neural implementations of Turing machines or other offline 
computational models, more recent results have demonstrated that biolog- 
ically more realistic neural microcircuit models consisting of spiking neu- 
rons and dynamic synapses are well-suited for real-time computational tasks 
([Buonomano and Merzenich, 1995], [Maass et ah, 2002], [Maass et al., 2003], 
[Natschlager and Maass, 2004]). Whereas related work had so far fo- 
cused on sensory processing tasks such as speech recognition or visual 

® In our experiments uo{t) was a very simple 2-dimensional function with value (0, 0) 
for t < 0 and value {xdest,ydest) for t > 0. All other external inputs to the circuit 
were only given during training. 
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movement analysis ([Buonomano and Merzenich, 1995], [Maass et al., 2002], 
[Legenstein et al., 2003]) we have applied such models here for the first time 
in a biologically more realistic closed loop setting, where the output of the neu- 
ral microcircuit model directly influence its future inputs. Obviously closed loop 
applications of neural microcircuit models provide a harder computational chal- 
lenge than open loop sensory processing, since small imprecisions in their out- 
put are likely to be amplified by the plant to yield even larger deviations in the 
feedback, which is likely to increase even further the imprecision of subsequent 
movement commands. This problem can be solved by teaching the readout from 
the neural microcircuit during training to ignore smaller deviations reported by 
feedback, thereby making the target trajectory of output torques an attractor in 
the resulting closed-loop dynamical system. After training, the learned reaching 
movements are generated completely autonomously by the neural circuit once 
it is given the target end position of the tip of the robot arm as (static) input. 
Furthermore the capability of the neural circuit to generate reaching movements 
automatically generalizes to novel target end positions of the tip of the robot 
arm that did not occur during training (see Fig. 5). Furthermore the velocity 
profile for these autonomously generated new reaching movements exhibits a 
bell-shaped velocity profile, like for the previously taught movement primitives. 
Surprisingly the performance of the neural microcircuit model for generating 
movement primitives not only deteriorates if the (simulated) proprioceptive feed- 
back is delayed by more than 250 ms, or if no feedback is given at all, but also 
if this feedback arrives without any delay. The best performance is achieved if 
the feedback arrives with a significant delay in the range of 25 to 100 ms. If 
the delay assumes other values, or is missing altogether, a significant improve- 
ment in the precision of the generated reaching movements is achieved after 
additional readouts from the same neural microcircuit models that generate the 
movements have been taught to estimate the values of the feedback with an op- 
timal delay of 200 ms, and if the results of these internally generated feedback 
estimates are provided as additional inputs to the circuit (see Fig. 6 b). Apart 
from these effects resulting from the interaction of the inherent circuit dynamics 
with the dynamics of external or internally generated feedbacks, also the spatial 
organization of information streams in the simulated neural microcircuit plays a 
significant role. The capability of such a circuit to generate movements is quite 
bad if information about slowly varying input variables (such as external or in- 
ternally generated feedback) is provided to the circuit in the form of a firing 
rate of a single neuron (not shown), rather than through the firing activity of a 
spatially extended array of inputs (see description in section 2) as implemented 
for the experiments reported in this article. Thus altogether these results may 
be viewed as a first step towards an exploration of the role of the “embodiment 
of neural computation” in concrete spatially extended neural circuit models and 
their resulting inherent temporal dynamics. This may complement the already 
existing work on the relevance of the embodiment of actuators to motor control 
[Pfeifer, 2002], and might possibly lead to a better understanding of biological 
motor control, and also provide new ideas for the design of robot controllers. The 
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paradigm for movement generation discussed in this article is somewhat related 
to preceding work [Ijspeert et ah, 2003], where abstract systems of differential 
equations were used, and to the melody-generation with artificial neural networks 
in discrete time of [Jager, 2002]. In these preceding models no effort was made to 
choose a movement generator whose inherent dynamics has a similarity to that 
of biological neural circuits. It has not yet been sufficiently investigated whether 
feedback; especially feedback with a realistic delay, can have similarly beneficial 
consequences in these other models. No effort was made in this article to make 
the process by which the neural circuit model (more specifically: the readouts 
from this circuit) learn to generate specific movement primitives in a biologically 
realistic fashion. Hence the results of this article only provide evidence that a 
generic neural microcircuit can hold the information needed to generate certain 
movement primitives, and once it has this information it can automatically use 
it to generate movements to other given targets. In some organisms such in- 
formation is provided through the genetic code, often in combination acquired 
from observation or trial-and-error. It remains to be explored to what extent a 
neural microcircuit model can also learn through trial-and-error (i.e., reinforce- 
ment learning) to execute basic movements, or to combine movement primitives 
to yield more complex movements. We believe that the control framework pre- 
sented in this article, based on a model for a neural system that can be chosen to 
be as complex and biologically realistic as one wants to, provides a quite fruitful 
platform for investigating the possible role and interaction of genetic information 
as well as various biological learning mechanisms, since it allows us to explore 
the role of biologically realistic models for neural system in the context of a 
functional closed-loop model where complex real-world movement control tasks 
can be addressed. 

Possibly some of the results reported in this article also provide new 
ideas for tackling complex robot control tasks even in contexts where supe- 
rior performance rather than biological realism in required. As indicated in 
[Maass et al., 2003], there are various methods for abstracting salient computa- 
tional functions of generic neural microcircuits in ways that can be implemented 
much more efficiently on digital computers than a straightforward simulation of a 
neural microcircuit (see [Nessler and Maass, 2003]). In this way the inspiration 
from biological movement control may give rise to new methods for real-time 
adaptive robot control that help to solve some of the challenging open problems 
in that area. 
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Abstract. Prey retrieval, also known as foraging, is a widely used test 
application in collective robotics. The task consists in searching for ob- 
jects spread in the environment and in bringing them to a specific place 
called nest. Scientific issues usually concern efficient exploration, map- 
ping, communication among agents, task coordination and allocation, 
and conflict resolution. In particular, interferences among robots reduce 
the efficiency of the group in performing the task. Several works in the 
literature investigate how the control system of each robot or some form 
of middle/long range communication can reduce the interferences. In 
this work, we show that a simple adaptation mechanism, inspired by 
ants’ behaviour and based only on information locally available to each 
robot, is effective in increasing the group efficiency. The same adapta- 
tion mechanism is also responsible for self-organised task allocation in 
the group. 



1 Introduction 

Scientific interest in collective robotic systems, in which several independent 
robots work together to achieve a given goal, can have both an engineering and 
a biological origin. From an engineering perspective, systems made of several 
agents are appealing because they represent a way of improving efficiency in 
the solution of tasks that are intrinsically parallel, such as the delivery of items 
in a factory or the exploration of unknown environments. From a biological 
perspective, many robots working together (or in competition) are interesting 
because they are a good test-bed for theories about self-organisation [8]. 

Recently, researchers interested in the design of the control programs for 
groups of robots have taken inspiration from biological systems, where swarms 
of animals are able to solve apparently complex problems. Some of the solutions 
that animals adopt rely on the exploitation of the dynamics brought forth by the 
interactions among agents and between agents and the environment.^ A control 
program that emulates these solutions exploits some features of the environment 

^ A well known example is that of ants that lay and follow pheromone trails while 
foraging. The interplay between the pheromone laid by each ant and its evaporation 
makes the shortest path to the food source become the preferred one [6, p. 26-31]. 
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and does not rely on direct means of communication. The robotic field that 
studies this approach is referred to as swarm robotics. 

We present here a preliminary study of a system that uses swarm-robotic 
techniques. We consider a retrieval task for a group of robots. Taking inspiration 
from prey retrieval in ants, we analyse the effects that a simple form of adaptation 
has on the behaviour of the group. After having introduced the concept of group 
efficiency in retrieval, we show that adaptation is a valid means to improve the 
efficiency and that a self-organised task-allocation phenomenon takes place. 

This work was carried out within the framework of the SWARM-BOTS 
project, a Future and Emerging Technologies project founded by the CEC, whose 
aim is to design new artifacts able to self-assemble and that co-operate using 
swarm-intelligent algorithms. 

This section continues describing the field of swarm robotics and the 
SWARM-BOTS project. Section 2 describes the problem of prey retrieval in 
robots and ants. Section 3 defines the concept of efficiency of retrieval and ex- 
plains our approach to its improvement. Section 4 describes the hardware and 
the software used to run the experiments. Section 5 reports and discusses the 
results we obtained. Section 6 describes related work. Finally, Section 7 draws 
some conclusions and lists possible future directions of research. 



1.1 Swarm Robotics 

Some collective behaviours observed in Nature, such as in ant colonies or other 
animal societies, can be explained without the assumption of direct communi- 
cation among individuals, but only by exploitation of the environment. This is 
a form of indirect communication called stigmergy [12,14]. “In situations where 
many individuals contribute to a collective effort, such as a colony of termites 
building a nest, stimuli provided by the emerging structure itself can be a rich 
source of information for the individual” [8, p. 23]. “In stigmergic labor, it is 
the product of work previously accomplished, rather than direct communication 
among nest-mates, that induces the insects to perform additional labor” [24, 
p. 229]. 

Controllers that use stigmergic communication are usually simple. They are 
often made of reactive behaviours which exploit the dynamics and the complexity 
of the environment itself. The same controller can be used on a huge amount of 
robots of the same kind, that is, in a swarm. The design of such controllers is 
the object of study in the field of swarm robotics, which is part of the field of 
swarm intelligence [6]. 

An important issue in swarm robotics regards the understanding of the rela- 
tionship between local and global behaviours in the swarm. The dynamics and 
the factors that play an important role in the group are not easy to identify, to 
model and to control. In this context, a good analysis and understanding of the 
dynamics of the system plays a crucial role. 
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Fig. 1. Picture of the prototype of an s-bot. 



1.2 The SWARM-BOTS Project 

The aim of the SWARM-BOTS project^ is to develop a new robotic system, 
a swarm-bot, composed of several independent and small modules, called s-bots 
(Fig. 1). Each module is self contained, capable of independent movement and of 
connecting with other modules to form a swarm-bot. This process is intended to 
be self-organised in order to adapt to dynamic environments or difficult tasks. 
Examples of difficult tasks are the pulling of heavy objects or exploration on 
rough terrain. Collaboration is achieved by means of stigmergic communication. 
The control program of each s-bot uses techniques derived from swarm intelli- 
gence and inspired from similar phenomena observed in biology [3,8]. 

The project lies between the fields of collective robotics, where robots are au- 
tonomous but do not connect to each other, and of metamorphic robotics, where 
robots need to be always connected and therefore are not fully autonomous. Some 
works in collective robotics are described in Sec. 6. Examples of metamorphic 
robots are described in [17,19,20]. 

2 Problem Description 

The typical environment used in the literature for prey retrieval experiments is 
given by (Fig. 2): 

— a group of robots, which we also call colony or swarm- 

— objects spread in the environment (they may have different size and shape, 
they may be fixed or move, they may appear or disappear with some prob- 
ability distribution, etc.), which are called prey, 

— a special area called home, nest, or target. 

^ For more information on the project see www.swarm-bots.org 
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Fig. 2. Schema of a prey retrieval task. A group of robots has to collect objects in the 
environment (the prey) and to bring them to an area called nest. 



The task the robots have to solve is to find the prey and bring them to the 
nest. To distinguish the robots that actually carry the prey from the others, the 
former are called foragers. 

Prey retrieval, also known as foraging, is among the different tasks that Cao 
et al. [9] consider the canonical domains for collective robotics. It is often used 
as a model for other real-world applications, such as toxic-waste cleanup, search 
and rescue, demining or exploration and collection of terrain samples in unknown 
environments. The main scientific interest concerns the question whether there 
is an actual performance gain in using more than one robot, since the task can 
be accomplished by a single one [9]. Other works in literature [4,13] identify 
in the interferences among robots the factor that makes the performance grow 
sub-linearly with the number of robots. 

There are many similarities with foraging in ants. In particular, ants’ foraging 
is a collective behaviour exactly as in robotics, therefore it comes natural to look 
into it for some inspiration. Many aspects are still under study, but the main 
features of ants’ foraging can be summarised as follows (Fig. 3) [11,16]: 

1. ants explore randomly the environment till one finds a prey; 

2. if the prey is not too heavy, an ant tries to pull it to the nest; otherwise, it 
tries to cut it or to use short or long range recruitment; 

3. the prey is pulled straight to the nest (pushing is never observed), both in 
case of individual or collective retrieval; 

4. after the retrieval, the ant returns directly where it found the prey. 

The foraging behaviour of a single ant may be influenced by several factors, 
like age, genetic differences or learning. The role of the latter was studied by 
Deneubourg et al. [10] from a theoretical and numerical point of view. We explain 
more in details their model in the next section. 
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Fig. 3. Prey retrieval in ants. 1) A forager randomly explore the environment. When it 
found a prey it selects which action to take. 2) Both if the prey is retrieved alone and 
collectively, ants pull (never push) it straight to the nest. 3) After the prey is retrieved, 
the ant that found the prey returns straight on the place where it was discovered. 



3 Efficiency in Prey Retrieval 

In order to continue the presentation of our work, we need to give a precise 
definition of the term “efficiency” and to explain its role in the collective task of 
prey retrieval. The definition we give comes from the observations of ants. 

Prey retrieval in ants has two components that must be taken into account. 
On the one hand, ants need prey to obtain energy to survive. This is the income 
of the colony. On the other hand, searching also has drawbacks, that can come 
from dangers in the environment, from the interferences among nest-mates (such 
as blocking the way to other ants, or collisions that slow down their speed), or 
from the fact that ants spend energy to move. All these are the costs of the 
colony. 

Income and costs depend on the number of foragers X . They both increase 
with X, but not in the same way. The income saturates when X is too big (ants 
can not retrieve more prey than their actual number in the environment, even 
if the number of forager X is doubled), but costs potentially increase without 
limit. There is a point in which costs are higher than the income. If we define 
efficiency of the group as 



income 

costs 



( 1 ) 



there is a value X than maximises it. That is, if A foragers are used, the cost of 
retrieval per prey is minimal. 

We indicate as “performance of the colony” the number of retrieved prey, 
that is, the income. Note that the words “performance” and “efficiency” have 
been used with different meanings in the robotics literature. For instance, “per- 
formance” refers to the time it takes to retrieve all the prey in the environment 
in some works, or the number of retrieved prey in others. We claim that these 
definitions depend on the particular application the researchers have in mind for 
their group of robots. For instance, time is an important factor in case of search 
and rescue applications, but the number of collected items is more interesting 
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in the case of spatial exploration. We are more interested in the latter appli- 
cation, that is, something that resemble more ants’ foraging. Since ants never 
stop looking for food, a concept of performance based on time is useless in our 
context. 

Ants suggest some ways to improve the efficiency of a swarm of robots in 
a prey retrieval task, but most of them are hard to implement. For instance, 
they use recruitment and stigmergic communication, which unfortunately rely 
on chemical substances whose effect can not be easily emulated with robots. 
Evolution surely played an important role in tuning the behaviours of ants for 
optimal foraging in particular environments, but a solution based on evolutionary 
robotics [18] requires too much time on real robots and, when it is used on-line, 
is too slow in case of sudden changes in the environment. 

Deneubourg et al. [10] suggest that ants use life-time adaptation. The authors 
model each ant with an agent that has a probability to leave the nest Pi, which is 
modulated according to previous successes or failures. If an ant retrieves a prey, 
its P\ increases. If it spends too much time without success, its Pi decreases. 
They show that this hypothesis can explain some of the patterns in ants’ foraging 
behaviour. 

We expect, if we use a similar algorithm for our swarm of robots, to observe 
the following effects: 

Efficiency increase : if there were many more robots than prey, many robots 
would not be successful in retrieval. They would decrease their Pi and spend 
more time in the nest, leaving more room for the others to work. If there were 
far fewer robots than prey, some robots would eventually exit the nest and 
be successful in retrieval, increase their Pi and spend more time in foraging. 
The efficiency of the group would improve in both cases without external 
intervention: that is, the improvement is self-organised. 

Task allocation : some robots would retrieve by chance more prey than the 
others, and therefore their Pi would increase; therefore, they would spend 
more time in foraging. But, the more the time spent, the more prey they 
would retrieve and the higher their Pi would become. This is an amplification 
phenomenon typical of many biological systems [8]. The opposite would hold 
true for those robots that were less successful. After a while, two classes of 
robots would appear in the environment, allocated to two tasks: foragers, 
with high Pi and that retrieve prey to the nest, and loafers, with low Pi and 
that prefer to stay in the nest. The allocation is, again, self-organised. 

The robots described in the next section and the experiments of Sec. 5 are 
meant to test these two hypotheses. Our robots do not use direct communication 
and interactions among them are only indirect. For instance, a successful robot 
decreases the density of prey and therefore the behaviour of its nest-mates. 

4 Hardware and Software 

We used real robots instead of simulation. The latter offers many advantages, 
such as speed and reliability, but it is based on a model of the environment in 
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(a) Front view. (b) Side view. 

Fig. 4. Picture of a MindS-bot. 



which important aspects might be missing or badly approximated. The work 
described in this paper has also the aim of identifying the physical features 
of the environment that must be modelled in a simulator. The results of our 
experiments will be used to tune a simulator in future work. 

No s-bot was available at the time of the experiments yet. Therefore, in our 
experiments we used Lego Mindstorms"’’^ to build the robots, which we call 
MindS-bots (Fig. 4). MindS-bots are based on a Hitachi H8300-HMS 1 MHz 
microprocessor with 32Kb RAM. They have one light sensor and one bumper 
on both the front and the back side, for a total of four sensors. The traction 
system, based on two tracks controlled by two motors, resembles the one of the 
s-bot. Two arms on the front side form the gripper that is used to grasp prey 
and that is controlled by another motor. 

BrickOS,^ a POSIX-like operating system, runs on the MindS-bots. The con- 
trol program is written in C and then downloaded on the robots. The finite state 
machine in Fig. 5 represents the control program of the MindS-bots. Different 
states are the different phases of prey retrieval, that is, the sub-tasks in which 
the overall prey retrieval task is decomposed. These sub-tasks are as follows: 

Search : the MindS-bot looks for a prey randomly exploring the environment 
(as ants do) and changing direction when a bumper is pressed. If a prey is 
found, the MindS-bot grasps it. If a timeout occurs without having grasped 
a prey, the MindS-bot gives up foraging. 

Retrieve : the MindS-bot looks for the nest and pulls the prey toward it. Since 
the gripper is on the front, the MindS-bot uses the sensors on its back for 
this purpose. 

Deposit : the MindS-bot leaves the prey in the nest and turns toward the point 
from which it came (to mimic ants’ behaviour). 

Give Up : the MindS-bot looks for the nest and returns to it. 

Rest : the MindS-bot rests in the nest. 

® http : / /brickos . sourceforge.net/ 
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Fig. 5. Sketch of the control system of a MindS-bot. The states represent different 
phases of retrieval (see text). The labels on each edge represent the conditions under 
which the MindS-bot changes state. The transition from Rest to Search is based on the 
probability Pi. The transition from Deposit to Rest represents a successful retrieval 
(Pi is increased), the one from Search to Give Up is a failure (Pi is decreased). 



Algorithm 1 Adaptation mechanism: Variable Delta Rule 

initialisation : successes 0; failures <— 0; Pi t— Pnit 
if prey retrieved then 

successes successes + 1; failures •(— 0 
Pi t— Pi + successes * A 
if Pi > Pmax then 
Pi t Pmax 
end if 
else 

if timeout then 

successes 0; failures failures + 1 
Pi t— Pi - failures * A 
if Pi < Pmin then 
Pi Pmln 

end if 
end if 
end if 



Transitions between states occur when the labels on the edges in Fig. 5 
are true, except the one from Rest to Search which is controlled by Pi. The 
transition from Search to Give Up represents a failure in retrieval, whilst the 
transition from Deposit to Rest is a success. 



4.1 Adaptation Mechanism 

MindS-bots adapt their P\ according to the algorithm depicted in Alg. 1, called 
Variable Delta Rule (VDR). Its parameters are: A (the base increment), Pmin, 
Pmax (the minimum and maximum value that Pi can reach) and Pinit (the initial 
value of Pi). Two counters store the number of successes and failures in a row 
of the MindS-bot and multiply A before being added or subtracted to Pp The 
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Fig. 6. Snapshot of an experiment. Four MindS-bots are looking for three prey. The 
nest is indicated by a light in the centre. One MindS-bot is resting in the nest and 
is therefore inactive. Two others are exploring the environment. One found a prey. It 
grasped it and is searching for the nest to retrieve it. 



effect is that the reward is much bigger for those MindS-hots that can repeatedly 
retrieve more prey than for the others. 



5 Experiments and Results 

For the experiments, we used a circular arena (Fig. 6) with a diameter of 2.40 m. 
A light bulb is used to signal the position of the nest in the centre of the arena. 
Walls and floors are white, prey are black. The search timeout is fixed to 228 s."* 
A is set to 0.005, Pmin to 0.0015, Pmax to 0.05, Pinit to 0.033. Prey appear 
randomly in the environment. The probability that a prey appears each second 
is 0.006. A new prey is placed randomly in the arena so that its distance from 
the nest is in [0.5 m, 1.1m]. Values were chosen on the base of a trial-and-error 
methodology. 



5.1 Efficiency of the System 

The first of our hypotheses is that the use of adaptation of Pi increases the 
efficiency of the foraging task with respect to a group that does not use it and 
whose available robots are all foragers. In order to test this hypothesis, it is not 
possible to look at the value of (1) in the colony because the costs cannot be 
quantified. In fact, they comprise too many factors, some of which are unknown. 

^ This value is the estimate of the median time needed by a single MindS-bot to find 
one prey when it is alone in the arena. 
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Thus, as efficiency index we used 

performance 
colony duty time ’ 

where colony duty time is the sum of the time each MindS-bot of the colony spent 
in searching or retrieving and performance is the number of retrieved prey. The 
colony duty time is directly related to the costs: the higher it is, the higher is 
the probability that some robot gets lost or brakes down, the higher the energy 
consumption, and so forth. Therefore, if v increases, rj increases too. 

The quantitative improvement of the efficiency was measured in a series of 
ten experiments lasting 2400 s each. Four MindS-bots out of a pool of six were 
used in each experiment, and changed among experiments. For each experiment, 
we conducted a control, with the same robots, in which P\ was not adapted and 
set to 1. Moreover, the prey in the control experiments did not appear randomly, 
but at the same time and place as in the original experiments. Figure 7 plots 
the mean value of v in time for both experiments. The difference is statistically 
significant after 1400 s.^ 

The colony that uses adaptation is more efficient because it has less colony 
duty time. The ratio between the final v in the two colonies is 1.41, but the ratio 
of their performances (Table 1) is only 1.04. Moreover, there is no statistical 
difference in the performances between the two colonies.® Therefore, we deduce 
that the improvement is due to the colony duty time. This means that, in the 
colony that uses adaptation, foragers can do their job easily and in less time 
because of less interferences. 

When the Variable Delta Rule was used, there were 2.57 foragers and 2.44 
prey on average in the arena in the period between 1000 s and 2400 s (Fig. 8(a)). 
In the control experiments, there were 3.63 foragers and 3.49 prey (Fig. 8(b)). In 
both cases the ratio is nearly one robot per prey but there are less robots out of 
the nest when adaptation is used. We were surprised to see that fewer foragers 
did not correspond to a worse performance. The explanation could be in the 
fact that in our setup 3.49 MindS-bots represent an overcrowded environment 
in which there are many interferences and in which the robots can not perform 
well their job. 



5.2 Task Allocation 

The second hypothesis regards task allocation. We expect that the adaptation 
mechanism leads to the creation of two classes of MindS-bots in the colony: 
foragers and loafers. The task of the first group is to search and retrieve prey 

® Sign test for paired data [23, p. 80-87]. Null hypothesis: u is the same in the two 
colonies. The p-value is 0.01074 from 1400 s to 1500 s and 0.00098 from 1500 s on. 

® Permutation test on the data of Table 1. This kind of non-parametric test is among 
the most powerful because it uses all the information available in a data set [23, 
p. 95]. Null hypothesis: the performances are the same. Alternative hypothesis: the 
colony with adaptation performs better. The p-value is 0.2637. 
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Fig. 7 . Value of u (2) when Pi is adapted (continuous line) and when it is not (dashed 
line). The average over 10 experiments is plotted. Vertical lines represent the standard 
deviation. The two curves are statistically different after 1400 s (see footnote 5). 



Table 1. Performances of colonies with and without adaptation of P\. The first column 
contains the experiment number and the second the total amount of prey that appeared 
during the experiment. The third column is the number of prey retrieved when the 
VDR is used, the fourth refers to the the control experiments. The last row sums the 
results. Bold numbers are used to indicate which setup retrieved more prey. There is 
no statistical difference in the performance of the two colonies. 



Exp. 


Tot. prey 


prey retrieved 


with adaptation 


without adaptation 


1 


15 


14 


13 


2 


17 


14 


14 


3 


12 


8 


7 


4 


18 


12 


11 


5 


16 


11 


12 


6 


21 


18 


15 


7 


14 


10 


12 


8 


16 


12 


15 


9 


16 


16 


14 


10 


24 


19 


15 


Total: 


169 


134 


128 
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with adaptation 



without adaptation 




dme(s) 

(a) Mean number of MindS-bots and 
prey when the VDR was used. Be- 
tween 1000 s and 2400 s, when the 
system is at regime, there are on av- 
erage 2.57 foragers and 2.44 prey. 




(b) Mean number of MindS-bots and 
prey in control experiments with 
Pi = 1. Between 1000 s and 2400 s, 
when the system is at regime, there 
are on average 3.63 foragers and 3.49 
prey. 



Fig. 8. Mean number of foragers and prey observed during the first 2400 seconds in 
experiments with adaptation and in control experiments. The continuous line represents 
MindS-bots, whilst the dotted one represents prey. Vertical lines show the standard 
deviation. Data is collected over 10 experiments. 



while the second group stays in the nest and avoids to interfere with the activity 
of the others. 

The only means by which the VDR can allocate tasks is the modification of 
P\. We consider an instant t when the colony reached its regime. The value of P\ 
of each MindS-bot at time t is a random variable that assumes different values 
for different experiments according to an unknown distribution. The estimate of 
this distribution can give us enough information to test our hypothesis: if it is a 
single-peak distribution, then there is no separation in classes; if there are two 
peaks, then task allocation occurs. 

We collected the value of P\ of each MindS-bot in the ten experiments of 
Sec. 5.1 after 2400 s (four robots times ten experiments) and we obtained the 
histogram in Fig. 9. We deduce from its U-shape that the MindS-bots have higher 
probability to have P\ next to one of the two peaks of the distribution. 60% of 
the MindS-bot in the population have P\ < 0.02 and represent the loafers. The 
remaining 40% have Pi > 0.025 and are the foragers. Few MindS-bots have Pi 
around 0.02 and 0.025, suggesting that the VDR prevents the maintenance of a 
high fraction of unspecialised robots pushing them toward one of the two peaks. 

Since we ran few experiments, it could be that the right peak of the distri- 
bution is due to a few lucky experiments. Table 2 shows on the contrary that 
foragers were present in nearly all the experiments.^ 



^ No foragers were present in experiment 3, that is also the one in which less prey 
appeared, as shown in Table 1. 
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frequencies after 2400s 




0.00 0.01 0.02 0.03 0.04 0.05 

probability to leave the nest 



Fig. 9. Histogram of the observed Pi in the MindS-bots at 2400 s. The distribution 
has two peaks, showing that some of the MindS-bots are allocated to the foraging 
task (those with high probability) and the others are loafers (low probability). 60% of 
the observations are below 0.025. Data refers to ten experiments, four MindS-bots per 
experiment. 

Table 2. Number of MindS-bots that are foragers and loafers per experiment. The 
presence of forager is nearly systematic. Bold numbers refer to the only experiment 
without foragers. Data refers to ten experiment, four MindS-bots per experiment. 



Exp. 


^ loafers 


# foragers 


1 


3 


1 


2 


3 


1 


3 


4 


0 


4 


1 


3 


5 


3 


1 



Exp. 


# loafers 


^ foragers 


6 


2 


2 


7 


3 


1 


8 


2 


2 


9 


1 


3 


10 


2 


2 



6 Related Work 

Interferences among robots, mostly due to physical collisions, are known to be a 
problem in collective robotic tasks [4,13]. We briefly discuss here other techniques 
that have been developed to reduce them. 

A group of solutions to the interference problem works at the level of the 
control program. For instance, Schneider-Fontan and Mataric [22] introduced a 
priori territoriality in their programs, so that each robot is assigned to a region of 
the environment and can not trespass its borders. Each robot brings the objects 
in its area to the border nearest to the nest, where another robot takes care 
of them. Balch [4] reimplemented, for a retrieval task with more prey types, 
this approach and two others (homogeneous controls and robot specialisation 
by item type). He noticed that robots with homogeneous controllers gave the 
best performance. The two works reach contradictory conclusions, but in [4] the 
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author suggests two possible reasons: bad reimplementation of the algorithms or 
different set of test conditions. 

Another group of solutions relies on inter-robot communications. Balch and 
Arkin [5] tested the effects of increasing communication complexity, starting 
from no communication, on the performance of a group of robots. They tested 
the group with different tasks, including foraging. Their conclusion is that com- 
munication appears unnecessary when implicit communication already exists, 
but it significantly improves the performance in the other cases. The difference 
between simple and complex forms of communication is negligible. Rybsky et 
al. [21] tested the influence of signalling on foraging with respect to a system 
without communication, but no statistical difference was found. 

Hayes [15] follows an analytical approach. He provides an equation to suggest 
the optimal number of robots to use in an environment with given conditions. 
His approach requires however some knowledge of the environment, which is not 
always available. 

Agassounon et al. [1,2] follow a swarm-intelligent approach. They use a 
threshold-based model developed in biology [7] to allocate tasks among the 
robots in a clustering experiment. Each robot switches to the execution of a 
task only if the level of an external stimulus is higher than a threshold. In their 
case, the stimulus is given by the time spent to search for an object. If it is higher 
than the threshold, the robot goes back to the nest, reducing the total number 
of foragers. Once a robot stops to explore the environment, it can not switch 
back to the search behaviour. Their approach works well for a clustering task, 
but it could have some problems in prey retrieval where the density of prey can 
change abruptly. 

7 Conclusions 

We showed that a group of robots can self-organise in order to work more ef- 
ficiently using only a form of adaptation, inspired by ants’ behaviour, that is 
based on information locally available to each robot. Only communication that 
uses the environment itself as a media is used. Interferences, which are consid- 
ered as a negative factor in robotic retrieval, are exploited as a source of in- 
formation: robots perceive their effects, e.g. a failure, and adjust the behaviour 
consequently. The amplification mechanism of the adaptation, combined with 
random fluctuations is also responsible for the task allocation. 

Future research will follow different paths. First of all, we will analyse the 
effects of changing the initial density of robots. Then we will also study the self- 
regulatory mechanism of this adaptation by changing the distribution of prey 
during the experiments. The aim is to relax some of the constraints that define 
the experimental environment that we used. Furthermore, we will study the 
robustness of the adaptation to the perturbation of its parameters. For instance, 
if A decreases, we expect the dynamics to be slower and, if A increases, we expect 
the system to show an oscillatory behaviour in which, for example, MindS-bots 
frequently switch back and forth from low to high P\ values. 
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Abstract. We present a detailed account of the processing that occurs 
within a biologically-inspired model for visual homing. The Corner Gra- 
dient Snapshot Model (CGSM) initially presented in [1] was inspired by 
the snapshot model [2] which provided an algorithmic explanation for the 
ability of honeybees to return to a place of interest after being displaced. 
The concept of eellular vision is introduced as a constraint on processing. 
A eellular vision matrix processes visual information using retinotopi- 
cally arranged layers of low-level processing elements interacting locally. 
This style of processing reflects general principles known of visual pro- 
cessing throughout the animal kingdom. From a technical standpoint, 
this style of processing is inherently parallel. Here we describe a cellular 
vision matrix which implements CGSM and illustrate how this matrix 
obeys cellular vision. Some new comparative results are presented and 
it is found that CGSM’s performance degrades gracefully with environ- 
mental modification and occlusion. 



1 Introduction 

In [1] we presented a biologically-inspired model for visual homing. This model 
was inspired by the snapshot model, proposed to explain the ability of honey- 
bees to return to a place of interest after displacement [2]. Our model, called 
the Corner Gradient Snapshot Model (CGSM), was found to achieve successful 
homing on a dataset of real-world panoramic images. It outperformed a similar 
biologically-inspired model [3,4] on the same images. CGSM has two distinguish- 
ing features. First, it operates on real-world two-dimensional images, in contrast 
to most other models inspired by the snapshot model which operate on one- 
dimensional images, often taken of simulated or simplified worlds [5,6,7,8,9,10, 
11,3]. Secondly, all of CGSM’s processing is constrained to involve only local low- 
level interactions between neuron-like elements. We call this constraint cellular 
vision. Here our main purpose is to show how CGSM adheres to cellular vision 
by providing detailed wiring diagrams. We also present some new results which 
indicate that CGSM’s performance will degrade gracefully when the environment 
is significantly modified. 



A.J. Ijspeert et al. (Eds.): BioADIT 2004, LNCS 3141, pp. 290—305, 2004. 
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1.1 Visual Homing 

A critical competence for an agent, whether animal or robot, is the ability to 
repeatedly return to important places in the environment such as the nest or food 
sources. If this is achieved by exploiting visual cues then we call it visual homing. 
The snapshot model (described below) is a possible model for visual homing in 
the honeybee. With the snapshot model, a single “snapshot” image taken from 
the goal location is all that is stored to represent the environment. A number 
of alternative approaches have been proposed for visual homing, particularly in 
robotics. Basri et. al employ formal computer vision techniques to determine the 
epipolar geometry relating the snapshot and current locations and move directly 
home [12]. Memory-based approaches represent the environment by a set of 
images and associated home vectors [13,14,15]. Some memory-based approaches 
employ panoramic imaging systems as we do here (e.g. [14,15]). If the agent never 
has to leave the vicinity of the goal then the feature tracking approach of [16] can 
be employed for homing (this approach also uses corners as we do here). However, 
none of these more technical approaches consider the biologically-plausibility of 
their algorithms. Thus, we are inspired here by the snapshot model and we have 
attempted to provide an implementation of it which is both biologically-plausible 
and efficient. 



1.2 The Snapshot Model 

The snapshot model was developed to match data of honeybee search patterns 
[2]. A model agent is placed at the goal and allowed to capture a snapshot image. 
It is then displaced and allowed a return attempt. The disparity between the 
current and snapshot images is used to guide the return. One key requirement of 
the snapshot model is that the agent maintains a constant orientation. There is 
evidence to suggest that bees take on the same orientation when returning to a 
remembered place as they took when originally learning the layout of that place 
[17,18,19]. A robot homing via the snapshot model must employ some sort of 
compass system to maintain or compensate for changes in orientation. 



1.3 Cellular Vision 

The neurophysiology of visual homing in insects has not yet been probed [11]. 
How then can we posit an insect-inspired neural architecture to implement visual 
homing? The answer is to step back and look at the overall principles that seem 
to govern the processing of visual information in insects and other animals. 

The lower layers in human visual cortex are composed of neurons exhibiting 
differently structured receptive fields [20] . In their pioneering work on cat visual 
cortex, Hubei and Wiesel state, “the receptive field of a cell in the visual sys- 
tem may he defined as the region of retina (or visual field) over which one can 
influence the firing of that cell.” [21]. Further, visual processing neurons gen- 
erally appear to be organized into columns with the retina on the bottom and 
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increasingly elaborate, yet overlapping, receptive fields found as one ascends. 
This arrangement is retinotopic: “each level of the system is organized like a 
map of the retina.” [20]. These observations also hold for insects. In describing 
motion processing within the framework of the fly’s visual system Egelhaaf et al 
state “the visual system is organized in a retinotopic way by columnar elements” 
[22]. They describe how local image motion is detected by small-field neurons 
which subsequently connect to wide-held neurons. The small-held cells, called 
Elementary Motion Detectors (EMDs), compute local motion through lateral in- 
teractions between neighbouring cells on the same level within adjacent columns. 
To summarize, both vertebrates and invertebrates have visual systems that are 
structured according to the following general principles: Vertical arrangement of 
retinotopic columns; Horizontal arrangement into layers of heterogeneous func- 
tion; Local processing of neurons that interact with other neurons in nearby 
columns and layers. 

We refer to the use of these principles in artihcial vision as cellular vision, and 
a matrix of processing elements adhering to these principles as a Cellular Vision 
Matrix (CVM). The CVM explored here was designed to produce movement 
vectors from images. Thus, at some point the dimensionality of the processing 
matrix must be brought down to the two dimensions of a vector describing 
motion in the plane. This is done in the final two layers of the CVM which exhibit 
very wide receptive fields. This aspect is also incorporated into the cellular vision 
concept and is inspired by the increasingly wide receptive fields found as one 
ascends from the retina into the visual processing areas of many animals. 

A number of other researchers who have built computational and/or robotic 
models inspired by insects seem to be implicitly adhering to cellular vision. In [23] 
a robot inspired by the fly motion detection system has an array of hardware- 
implemented EMD units arranged retinotopically in a ring around the robot. 
Similar fly-inspired examples can be found in [24,25]. [26] presents a retinotopic 
neural model for the escape reflex of locusts. Additional examples of retinotopic 
neural models of insect vision include two others based also on the snapshot 
model [10,11]. Of these, the neural snapshot model [10] is of primary interest here 
because it provides the inspiration for CGSM. The primary difference being that 
the neural snapshot model operates only in a simplified simulation environment 
on one-dimensional binary images. 

Two notes before concluding this section. First, adopting cellular vision rules 
out a host of more traditional computational methods. Chief among these is 
search. Interestingly, most of the existing implementations of the snapshot model 
utilize search and are therefore not readily implementable in a CVM (exceptions 
in [10,11]). Second, a CVM is inherently parallel. Here and in [1] the CVM for 
CGSM is run on a serial computer. However, parallel implementation must be 
imagined to allow a fair comparison with non-CVM methods. If simulating a 
CVM with m layers at resolution n x n on a serial computer, the time com- 
plexity to complete processing of one image would be O(mn^). However, if the 
implementation is parallel the time complexity is just 0{m). 
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2 The CVM for CGSM 



The CVM which implements the CGSM homing method consists of 29 layers. 
Each layer is composed of a grid of elements equal in size to the input image, 
all sharing the same function. A group of adjacent layers that together performs 
some higher-level function is called a segment. We begin by describing the pro- 
cessing matrix at a high-level and then descend to describe how segments and 
groups of segments achieve the necessary high-level functions in low-level terms. 

A flow chart of the CGSM processing matrix is depicted in figure 1. An input 
image is fed into the processing matrix. It is first smoothed and then corners 
are extracted as features. If the agent is at the goal then the image of features 
is stored as the snapshot image. Gradients are formed around each feature and 
these gradients are locally compared with features in the snapshot image to 
generate vectors which indicate the direction that these features have moved in. 
These vectors specifying motion in the image are mapped onto vectors specifying 
the corresponding motion of the agent. Finally, this last set of vectors is summed 
to create the agent’s home vector. 




Current Image Features 





-- i ^Memory;^ --. | ^ Gradient Format!^ 






Fig. 1. Processing applied by the CGSM matrix. Adapted from [1]. 
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Before describing the CVM for CGSM in depth we look at it from a more 
removed perspective. Figure 2 shows the entire CVM, visualized in 3-D with 
OpenGL. A single column of the CVM is shown. Note how many layers are dis- 
placed diagonally. These displacements have been added to make clearly visible 
the connections between layers. Processing flows from the bottom right up to 
the upper left. All subsequent figures of the CVM will follow this convention. 
These figures are all zoomed-in views of particular parts of figure 2 with addi- 
tional labels and effects added. Note the circled summation sign at the upper 
left of the matrix. This indicates that the next-to-last and last layers are to be 
summed to form the x and y components (respectively) of the final agent motion 
vector that is output from the CVM. The next section provides detail on how 
the low-level structure of a CVM is to be described. Subsequent sections present 
each high-level component of the processing matrix in low-level terms. 




Oradkni Formation'" 

Snapshot’ 



Local Maiima Extraction' ■ 

Gaussian Filter 



X Derivative ' 
Gaussian 



InpLrt Layer 



Vector Summing 



Vector Normalizatiori' 



Fig. 2. Overview of CVM for CGSM, visualized in OpenGL. 



3 CVM Structure 

We can describe a CVM by giving just the neighbourhood and function of each 
processing element along a column that ascends from the input layer to the 
last layer of the CVM. The input layer can be considered the ‘retina’. By the 
time processing reaches the CVM’s last two layers the input image will have 
been transformed into a vector which will directly control the agent’s behaviour. 
The neighbourhood of an element within a CVM describes which of its peers it 
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receives input from and the weighting of their connections with that element. 
The function achieved by each processing element is one of the following: 

Conv. The sum of all neighbours’ weighted activation levels is returned. This is 
the convolution operation, prevalent in signal processing. 

Mult. The product of all neighbours’ weighted activation levels is returned. 
Max. The highest weighted activation level is returned. 

IsMax. If the special neighbour has the highest weighted activation of all neigh- 
bours then 1 is returned. Otherwise 0 is returned. The “special neighbour” 
is a single neighbour designated to have a special status. For IsMax it is just 
the one neighbour that all other neighbours are compared to. 

MaxDX. The horizontal displacement of the neighbour with the highest weighted 
activation level is returned. 

MaxDY. The vertical displacement of the neighbour with the highest weighted 
activation level is returned. 

TholdC/). If the special neighbour's weighted activation level is above / then 1 
is returned. Otherwise 0. 

Pow(/). The special neighbour's weighted activation level is raised to power /. 
Expr. Each neighbour may have a symbol associated with it whose purpose is to 
stand in for that neighbour’s activation level in an algebraic expression. The 
algebraic expression includes the usual operators and common numerical 
functions ( -f, — , *, /, ", sin, cos, sqrt, and the constant pi ). Also, 
the following special symbols are used to stand for certain element-specific 
parameters or global constants. 

c column index of current element in image (i.e. x-coordinate) 
r row index of current element in image (i.e. y-coordinate) 
w image width 
h image height 

o value set to -1-1 if the current element is in the top half of the image, and 
to -1 in the bottom half 

An element applies its function and then takes on the resultant value. All 
elements on a single layer are updated synchronously. In this implementation im- 
ages are horizontally panoramic but vertically bounded. If an element’s neigh- 
bourhood stretches horizontally past the image boundary then it is wrapped 
around toroidally to the opposite side of the image. If an element’s neighbour- 
hood stretches vertically past the image boundary then the element will be given 
a special out-of-bounds value, causing any subsequent elements that connect to 
this element to also become out-of-bounds. Effectively the image being processed 
will shrink vertically. This approach allows all functions to handle the border in 
a uniform manner. 

3.1 Gaussian Filter 

The input image is assumed to be corrupted with some amount of high-frequency 
noise. Some of this noise is removed with a 5x5 Gaussian filter. Figure 3(a) 
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depicts a 2-layer implementation of this filter. Layer 0 is the input layer and 
Layers 1 and 2 form the filter. The Gaussian kernel is separable, meaning that 
it can be split into two identical, but orthogonal layers [27]. After processing, 
Layer 2 will be a smoothed version of Layer 1. 





(a) Gaussian, X Deriv, Y Deriv 



(b) Y Deriv, Harris-Corner-Extr 



Fig. 3. CVM segments from input image to corner image. Larger spheres are elements, 
smaller spheres show connections. Numerical labels show connection weights. Symbols 
used in the Expr function are shown adjacent to corresponding elements. 



3.2 Corner Extraction 

We briefly review the general method of corner extraction first before describing 
how it is implemented here. The method is known as the Harris detector [27] 
(pages 82-85) and was originally presented in [28]. Consider an image point, p, 
and a small window centred on p. If the window is shifted then we can compare 
the amount of change between the old and new windows. A corner is defined 
as an image point where this degree of change is large for all possible shifts. A 
simplified check for this condition is that, for the shift producing the smallest 
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change, the level of change must still be large. An analysis is presented in [28] 
which equates this minimum change with the smaller eigenvalue of matrix M, 



M = 



i k 
k j 



where i = j = ^dI, fc = ^ D^Dy 



Dx and Dy are the spatial derivatives of the image in the x and y directions. 
The summations above cover the pixels of a small window centred on p. After a 
few algebraic steps we can arrive at an equation for the smaller eigenvalue, A^, 



, _ * + J - \/(i + jT - 4(ij - A:2) 
s 2 ^ ^ 

As will be the output from the Harris Corner Extraction segment. The first 
step in arriving at this value is to estimate spatial derivatives. Figure 3(a) shows 
the X Derivative and Y Derivative segments which do just that. We indicate a 
particular element at image position (x,y) and layer index I as The value 

of an element at position Vx,y,s is computed as Vx+i^y ^2 — Vx-i,y ,2 which is the 
central-difference approximation of the first derivative in the x direction. The 
Conv function of Layer 3 combined with its -1 and -1-1 connections to Layer 2 
performs this approximation. Layer 4 squares Layer 3. That is, Vx,y ^4 = Vx,y, 3 - 
This provides us with the value necessary to calculate Ag. The Y Derivative 
segment is similar to the X Derivative and provides us with Dy in Layer 6. 

The Harris Corner Extraction segment is shown in figure 3(b). Layer 7 does a 
summation from Layer 4 of the X Derivative segment. This calculates i = 'Yh ^x- 
Similarly, Layer 8 calculates j = X) ^y- The inputs to Layer 9 travel back from 
figure 3(b) to 3(a). They connect to Layers 3 and 5, the first layers of the X 
Derivative and Y Derivative segments. Multiplying them yields DxDy. Layer 10 
then does the summation to achieve fc = X DxDy. Finally, Layer 11 implements 
equation 1 to calculate Ag. 



3.3 Local Maxima Extraction 

Layer 11 holds an image of Ag values indicating the quality of corner at each 
position. This may be described as a ‘hilly corner image’. To extract discrete 
point features the peaks of these hills must be found. Figure 4(a) shows how 
this is achieved. First, another Gaussian Filter is applied in Layers 12 and 13 
to further reduce residual noise as well as any noise introduced by the corner 
extraction process. Next local maxima will be detected. The IsMax function in 
Layer 14 will set its element to 1 only if the corresponding element in Layer 
13 has a value strictly greater than all of its immediate neighbours. The Thold 
function in Layer 15 will set its element to 1 only if the corresponding element 
in Layer 13 has a value greater than the threshold (0.025). Finally, Layer 16 
multiplies these quantities to perform a boolean AND. That is, a point in the 
smoothed corner image (Layer 13) must be both a local maximum AND exceed 
a threshold if it is to be extracted as a corner. 
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(a) Loc-Max-Extr, Snapshot 



(b) Gradient Formation, Unit Ring 



Fig. 4. CVM segments from corner image, to gradient image, and finally to image 
vectors, (a) The IsMax function has one dark line to show connection to the special 
neighbour. The symbol beneath Snapshot indicates that this layer requires an out- 
side signal, (b) “x30” indicates that layer 19 is applied for 30 iterations. 



3.4 Snapshot 

Figure 4(a) also depicts the Snapshot layer, Layer 17. This layer simply performs 
a copy of Layer 16. However, the Snapshot layer requires an external signal to 
operate. This signal is made active when the agent is positioned at the goal 
position and allowed to take a snapshot image. If the signal is not active then 
the Snapshot layer will just maintain its storage. 



3.5 Gradient Formation 

Before describing the implementation of gradient formation in the CVM, we first 
review the purpose of forming gradients around detected features of the current 
image. The CGSM homing method requires that correspondences be established 
between current image features (Layer 16) and snapshot image features (Layer 
17), and that these correspondences should be in the form of vectors which point 
from the snapshot image feature to the corresponding current image features. 



Anatomy and Physiology of an Artificial Vision Matrix 



299 



From the perspective of an individual element within a CVM, how can these 
vectors be determined? There can be no ‘search for similar elements’ because 
search is a global operation not permitted within a CVM. One answer is to form 
a gradient image which rises from zero at the position of a current image feature 
and increases in proportion to distance from the feature. Then, correspondence is 
established by assuming that a snapshot image feature matches whatever current 
image feature was responsible for generating the particular ‘well’ of the gradient 
image which the snapshot feature is sitting on. The downhill direction of the 
gradient gives this disparity vector, or image vector. 

We use Hassoun and Sanghvi’s method to form the gradient image [29]. 
The intended goal of their algorithm is to determine the optimal path between 
two points on a grid, and to compute this path with a set of processors which 
are spatially distributed on the same grid. To this end, the first stage of their 
algorithm distributively computes a potential surface whose height gives the 
cost of moving to the source point We use just this aspect of their algorithm 
here, with cost defined as distance on the grid. In this case, however, there 
are multiple sources for the potential surface. The grid is initialized to 0 where 
there are features detected in the feature detection layer, and to some arbitrarily 
large value Gmax everywhere else^. Vx,y will indicate the current value of a grid 
element and y will indicate the new value. The set of neighbours of {x,y) is 
Ilx^y. The distance from (a;,7/)’s neighbour (p, g) G to {x,y) is dp^q. This 
whole method of gradient formation boils down to the following update rule, 

Vx^ = mm \Vx,y, min {Vp^q + dp^q)\ (2) 

L (p,9)e77a,,„ J 

Figure 4(b) shows the Gradient Formation segment. This segment is unique in 
this CVM in that its second layer (Layer 19) is applied in a loop for 30 iterations. 
On the first iteration this layer is initialized from Layer 18, but thereafter it is 
recurrently connected to itself (Layer 18 is executed only once). Ideally, a number 
of iterations equal to the largest dimension of the image would be used so that the 
gradient would be guaranteed to spread throughout the whole image. However, 
this is a computationally expensive operation and 30 was chosen as a tradeoff 
value which seemed to yield good performance. Layer 18 provides the initial 
conditions for Layer 19. Specifically, it initializes all non- feature points to Gmax 
and all feature points to zero. Layer 19 implements equation 2. 

See figure 1 for an example gradient image produced by this segment. 



3.6 Ring Operator 

Figure 4(b) also shows the Unit Ring segment. This segment calculates vectors 
describing the downhill direction of the gradient image at positions where a 

^ A device-efficient hardware implementation of this same idea is presented in [30]. 

^ Gmax is set here to 30, the number of iterations of the gradient formation layer. 
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feature in the snapshot layer exists. Layers 20 and 21 have ring-shaped neigh- 
bourhoods connecting to the output of the Gradient Formation segment. These 
two layers determine the x and y components (respectively) of the required vec- 
tor. Using the symbols in the figure this vector is [x,y]. However, this vector 
should only exist where there is a snapshot feature. Thus, Layers 22 and 23 mul- 
tiply the X and y components by s which stands for a connection to the snapshot 
layer. If s is 0, meaning that there is no snapshot feature at this position, then 
the vector [a;, y] will be zeroed. These expressions also divide [x, y] by its length, 
making it a unit vector. The negative signs in Layers 22 and 23 invert the image 
vector [x, y] so that it points downhill on the gradient image. 



3.7 Vector Mapping 

The last two layers of the Unit Ring segment encode image vectors expressing the 
movement of features within the image. The task of the Vector Mapping segment 
is to transform these image vectors into agent motion vectors. An image vector if 
is the movement of a feature in the image given the movement of the agent in the 
plane (agent motion vector). An approximate method was presented in [1] to 

obtain from if and 9$, the angular position of the snapshot feature within the 

image. Unfortunately, space limitations present repeat coverage of this method 
here. The method is achieved by equation 3, which has been implemented in this 
CVM as shown in figure 5(a). 






Vy sin 9$ — Vx sin {9s + 90°) 
Vy cos 9s — Vx cos {9s + 90°) 



(3) 



3.8 Vector Normalization and Summing 

In figure 5(b), the agent motion vectors produced in the Layers 26 and 27 are 
normalized. This is the final step of the processing chain. Now the final output of 
the CVM is determined by summing the normalized agent motion vectors stored 
in Layers 28 and 29. This is illustrated by the large circle in figure 5(b) which 
produces the two-dimensional vector used to move the agent. 

4 Results 

In [I] CGSM was compared with a model we referred to as the extracted Average 
Landmark Vector (XALV) model (from [3]) and found to exhibit superior perfor- 
mance. Here we compare the performance of CGSM with Franz et. al’s warping 
method [7]. Other biorobotics researchers have made comparisons between their 
own methods and the warping method, both favourable and unfavourable. For 
example, Weber et. al found that their method exceeded the performance of 

® Variables used in the Expr function are recycled. There are several places where the 
values used are x and y but these values are local to each Expr function. 
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(a) Vector Mapping 




(b) Vector Normalization, Summing 



Fig. 5. CVM segments to produce (a) agent motion vectors; (b) normalized agent 
motion vectors. The large circle in (b) indicates the summation of Layers 28 and 29 to 
produce the x and y components of the hnal home vector. 



the warping method — however, the comparison was made only in simulation [9]. 
Moller compared robotic implementations of the snapshot model, XALV, and 
the warping method and found that the warping method generally performed 
the best and was the least sensitive to parameter settings [31]. The warping 
method searches through the space of possible movements away from the home 
position. For each possible movement the snapshot image is warped to appear 
as if the robot had performed that movement away from the snapshot location. 
The parameters of the warped image that are most similar to the current image 
are used to compute the home vector. 

As in [1] we compare the two methods on the 300 panoramic images used in 
[3] (see the top left image in figure 1 for an example image from this database). 
This database of images was taken using an omnidirectional imaging system 
on a 9 to x 3to grid of positions in an unmodified university building entrance 
hall. The unwrapped panoramic images used here are 180 x 48 pixels in size. As 
a thorough test of performance each of the 300 images is used in turn as the 
snapshot image, with each such round of trials generating a set of home vectors 
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such as in figure 6. Once all home vectors are generated we attempt to find a path 
home from every grid position. The overall average number of positions where 
homing can succeed, over all snapshot images, is the return ratio. Table 4 reports 
the return ratio for CGSM and the warping method. As the left-most column 
of numbers in this table indicates, the warping method exhibits far superior 
homing performance. Some example homing maps are shown in figure 6 for both 
methods. These particular maps are for the snapshot position at coordinates 
(210cm, 180cm). 




(c) CGSM 



(d) CGSM, One Block 



Fig. 6. Homing vectors for position 210cm x 180cm. “One Block” means coemption 
by one block (see text). White cells indicate successfnl homing. Grey cells give the 
approach index for unsuccessful homing trips (see [1]). 



There are, however, at least two particular aspects of the experiment so 
far which are unrealistic from the perspective of insect modelling. Firstly, an 
insect such as a honeybee does not have truly panoramic vision. Honeybees in 
particular have a ‘blind spot’ of about 50° in the rear [32]. Secondly, an insect’s 
world is dynamic. We expect disruptions in the environment to impair homing 
performance, however robustness in the face of modified surroundings is critical. 

To model both of these conditions we have constructed three corrupted ver- 
sions of the 300 image database. The corruption is applied simply by blotting out 
(setting to zero) one, two, or three randomly positioned 40 x 48 vertical blocks 
within the image. If the blocks happen not to coincide, then this represents a loss 
of 22.2%, 44.4%, or 66.6% of the information possibly contained in the image (for 
one, two, and three blocks respectively). Each round of homing (300^ individual 
trials) is now based on snapshot images from the original image database and 
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current images from the disrupted database. The final three columns of table 4 
show how the warping method’s performance is severely affected by these dis- 
ruptions while CGSM exhibits graceful degradation and superior performance 
to the warping method for all the disrupted cases. Figure 6 shows how the home 
vectors for both methods change when a single corruption block is added. 



Table 1. Return ratios for the two methods under varying conditions. 





Conditions 


Method 


Undisturbed 


1 Block 


2 Blocks 


3 Blocks 


CGSM 

Warping Method 


69.5% 

94.3% 


63.8% 

34.2% 


57.1% 

19.0% 


52.5% 

15.3% 



5 Discussion 

The warping method achieves its excellent result on the undisturbed image 
database by exploiting global image information. All parts of the image con- 
tribute to the calculated home vector. While CGSM does not perform nearly 
as well as the warping method on the undisturbed database, it remains rela- 
tively unaffected by large transient changes to the image database. This is due 
to CGSM’s inherent functional parallelism. Approximate home vectors are gen- 
erated for each feature of the snapshot image. The computation of these home 
vectors occurs independently and in parallel. These vectors are finally summed 
to yield the overall result, but this is intended only to offset the possible negative 
impact of incorrect vectors. The final summation is the first and only time when 
these multiple results converge. 

That CGSM’s parallelism should turn out to provide robust performance is 
an interesting result. Functional parallelism was not one of the intended features 
of CGSM. Instead, the purpose of CGSM’s parallel computing style was to ensure 
biological-plausibility, and to enable the possibility of parallel implementation 
for improvement of algorithm complexity. However, we are now encouraged to 
look to parallelism for performance advantages as well. 

6 Conclusions 

We have presented here a detailed account of the structure and function of 
a cellular vision matrix for visual homing. It has been shown how operations 
such as corner extraction, local maxima extraction, gradient formation, and the 
vector operations could all be achieved by simple locally-connected elements 
arranged retinotopically. Experiments compared the performance of CGSM with 
the warping method and found that while CGSM’s performance is not as strong 
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as the warping method under ideal conditions, its performance degrades far more 
gracefully under conditions of large environmental modifications and occlusion. 

Much work remains for developing and exploring CGSM. Homing perfor- 
mance has not yet been tested live on a free-roving robotic platform. Also, 
a thorough analysis of error conditions remains to be completed. Some work 
in-progress includes the development of an exact vector mapping method and 
alternative means for locally determining feature correspondence. Finally, it will 
be interesting to see what other sorts of projects the cellular vision concept can 
be applied to. Cellular vision was inspired by research on biological systems de- 
scribed at a particular level of detail (low-level, but above that of neural wiring). 
It remains to be seen how useful this concept may be both in the creation of 
artificial vision systems — and perhaps — in the understanding of natural vision 
systems. 
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Abstract. Reliable group communication is essential for building appli- 
cations in distributed computing systems. Epidemic-style algorithms for 
group communication have attracted increasing interest. They emulate 
the spread of an infection; each computing node communicates with its 
randomly selected partners and information is disseminated by multi- 
ple rounds of such communication. Previous research has revealed that 
they are highly scalable and easy to deploy. In this paper we propose 
an adaptive mechanism with the aim of enhancing resiliency of these 
algorithms to perturbations, such as node failures. The mechanism dy- 
namically adjusts the fanout, the number of receiver partners each node 
selects, to changes in the environment. Two metrics are used for this 
purpose, which reflect the status local to the node itself and the behav- 
ior of the whole system. This mechanism is analogous to those that can 
be seen in many biological systems, where each constituent part behaves 
mainly independently but is controlled indirectly by the whole system. 



1 Introduction 

Reliable algorithms for group communication are essential for building applica- 
tions in distributed computing environments. The growth of the Internet has 
influenced the scale and the reliability requirements of distributed systems. Tra- 
ditional solutions applicable in small-scale settings often do not scale well to very 
large system sizes. 

Epidemic or gossip communication algorithms have received increasing at- 
tention as an alternative to the traditional algorithms [1,2, 3, 4]. As the name 
suggests, these algorithms have their analogy in epidemiology. 

The essential characteristic of epidemic algorithms is that information ex- 
changes, or infections, occur between sender nodes and randomly chosen receiver 
nodes. Information is disseminated throughout the system by multiple rounds 
of such communication. 

Research has proven that these algorithms scale well to large systems and 
are easy to deploy. Moreover fault tolerance is achieved, because a node can 
receive copies of a message from different nodes. For the same reason nodes do 
not need to wait for acknowledgments. In addition, as virus spreads in a totally 
decentralized manner, no node has a specific role to play; thus a failed node will 
not prevent other nodes from continuing sending messages. Hence, there is no 



A.J. Ijspeert et al. (Eds.): BioADIT 2004, LNCS 3141, pp. 306—316, 2004. 
@ Springer- Verlag Berlin Heidelberg 2004 




An Adaptive Mechanism for Epidemic Communication 307 



need for failure detection or specific recovery action. Due to these properties, 
epidemic algorithms have been used in various contexts, such as distributed 
databases [5, 6, 7, 8] and distributed failure detection [9,10]. 

This high resiliency to failures is, however, challenged in large-scale settings. 
In traditional epidemic algorithms, nodes choose receiver nodes from all mem- 
bers in a system. Therefore these algorithms require that each node should know 
every other node, thus limiting the applicability of the algorithms in large-scale 
settings. To address this problem, recent epidemic algorithms are designed so 
that they can operate even when nodes have partial knowledge of membership 
[2,11,12]. Although these algorithms are scalable, resiliency is hampered to a sub- 
stantial extent, because incomplete membership knowledge can limit potential 
paths of message passing. 

For example, the analysis results presented in [13] demonstrate that Scamp, 
an epidemic algorithm which uses partial views, can achieve very high reliability 
on average, regardless of the small size of views. In [13], however, it is also 
reported that the reliability of each broadcast exhibits a bimodal behavior; either 
the reliability is as good as in an epidemic algorithm using full membership 
knowledge or the broadcast does not spread at all. 

In this paper we propose an adaptive mechanism with the aim of enhanc- 
ing resiliency of epidemic-style broadcast algorithms. According to the changes 
in environment conditions, such as node failures, the mechanism dynamically 
adjusts the fanout, the number of partners each node selects. For this purpose 
two metrics are used, which reflect the status of a node itself and that of the 
whole system. This mechanism is analogous to some biological functions of liv- 
ing organisms, such as metabolism, where each constituent part behaves mainly 
independently but is controlled indirectly by the whole biological system. 

Adaptive mechanisms for epidemic algorithms have not been sufficiently ex- 
amined so far. The only adaptive algorithm we know of is the one proposed in 
[12]. This algorithm adjusts the rate of message emission to the memory size 
available for buffering broadcast messages and to the global level of congestion. 
The main purpose of the algorithm is to adapt to limited memory resources 
while maintaining broadcast reliability quality, and thus is different from ours. 

The remaining part of this paper is organized as follows. In Sect. 2 we outline 
epidemic algorithms. In Sect. 3 we present a basic idea behind the proposed 
adaptive mechanism and give an epidemic algorithm employing this mechanism. 
Results of an experiment are presented in Sect. 4. Section 5 concludes the paper. 

2 Background 

2.1 Basic Algorithm 

Figure 1 shows a typical epidemic algorithm. The value / denotes the fanout. 

The algorithm works as follows. When a node initiates a broadcast message 
m, it sends m together with TTLinuto f randomly selected nodes. Here TTLinit 
represents the initial value of Time- To-Live (TT L) which is the maximum hops 
that the message can take before expiration. 
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initiate broadcast of m: 

send (m,TTLinit) to / randomly chosen receiver nodes; 

upon receiving broadcast message (m,TTL): 
if m has been received for the first time 
TTL ■- TTL - 1; 
if TTL > 0 

send {m,TTL) to / randomly chosen nodes; 



Fig. 1. A simple epidemic algorithm. 



initiate broadcast of m: 

send (m,TTLinit) to / nodes randomly chosen from viewi; 

upon receiving broadcast message (m,TTL): 
if m has been received for the first time 
TTL ■- TTL - 1; 
if TTL > 0 

send {m,TTL) to / nodes randomly chosen from viewi\ 



Fig. 2. An epidemic algorithm using parial membership knowledge. 



Upon receiving a broadcast message m and its associated TTL value, a node 
checks if itself has already been infected, that is, if the node has already received 
m. If the node is already infected, then the node simply drops the message. 
Otherwise it decrements TTL by 1. If TTL reaches 0, then the node drops the 
message. If the value is still positive, the node, in turn, forwards m with TTL 
to / randomly selected nodes. 

For the sake of simplicity of illustration, we use throughout this paper the 
same fanout value for both initiating and relaying a broadcast message. However 
the algorithms shown in this paper can easily be modified to use different fanout 
values for these two purposes (as in [6]). 

2.2 Partial Views of Membership 

In traditional epidemic algorithms, nodes choose receiver nodes from all members 
in a system. These algorithms thus rely on the assumption that each node knows 
every other node. This limits their applicability in large-scale settings. 

To address this issue, several new algorithms have been developed that can 
operate with partial knowledge of membership. Such an algorithm is shown in 
Fig. 2. In this algorithm, nodes maintain a view of membership. On receiving a 
broadcast message, each node sends messages to other nodes in its view. Each 
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node selects / random members from the view and sends messages to all these 
selected members. We denote the view of node i by viewi. 

As stated in the first section, applying partial views improves the scalability 
of epidemic algorithms but decreases the reliability to a substantial degree. The 
proposed adaptive mechanism presented in the next section can be used regard- 
less of the assumption about membership knowledge on which the underlying 
epidemic algorithm is based; but we believe that it can be more useful when 
only partial membership knowledge is available for individual nodes. (We should 
remark that this argument does not apply to Scamp [13], a partial-membership- 
view-based algorithm, because in Scamp partial view construction is probabilis- 
tic but message transmission is deterministic; each node sends messages to all 
members in its view. This is also the case for the Harary-graph-based flooding 
algorithm presented in [14].) 

3 Adaptive Mechanism 

3.1 Intuition 

As stated above, we propose an adaptive mechanism to keep reliability of epi- 
demic communication in the presence of perturbations, such as node or commu- 
nication failures. The proposed mechanism allows each node to adjust its fanout 
value to the changes in environmental conditions. In the rest of the paper we 
denote the fanout of node i by fi. 

The challenge lies in ensuring that nodes are able to perceive the quality of 
epidemic communication, without interacting explicitly with other nodes of the 
system. Such interaction would hamper the scalability and the implementation 
feasibility of the algorithm. 

The intuitive idea underlying our mechanism is as follows. Each node counts 
how many times it repeatedly receives each broadcast message. Nodes thus can 
perceive the status of the whole system; if the same message has been received 
too many times, probably nodes are using a too large fanout value. On the other 
hand, if many broadcast messages have arrived only once, it is likely that there 
were other messages that failed to reach the node. An important observation 
is that this information can be obtained independently by each node with no 
additional overhead. 

In addition, to perceive information more specific to different nodes, each 
node deals with the messages initiated by itself differently from the other mes- 
sages in our mechanism. The rationale is that the fanout of the initiator node 
affects the dissemination more directly, than those of other nodes do. This is 
analogous to virus epidemic where the number of people who are initially in- 
fected has significant effects on the likelihood of epidemic spreading. 

Specifically, each node uses two metrics to adapt to changes in system status. 
One metric, denoted by G, is the number of times each of the latest k broadcast 
messages has been received on average. This metric can be considered to reflect 
the global status of the whole system. The other metric, denoted by /, is how 
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many times the latest broadcast message that was initiated by itself has been 
received from the other nodes. This metric can be considered to mainly reflect 
the local status of that node. 

Adjustment is performed using these metrics as follows. Suppose that Oq and 
9i represent the desirable values for G and I respectively. If G exceeds Oq, fi is 
increased; otherwise, that is, if G is lower than 6g, fi is decreased. 

When a node initiated a broadcast, the control can be more specific to that 
node, by using 1. Basically if I is smaller than 0i, fi is increased. However G is 
greater than 9g, the increase is reduced. On the other hand, if G is smaller than 
9g, the increase is accelerated. When fi is decreased, the opposite is applied; 
that is, fi is decreased if I is greater than 9i . The decrease in fi is amplified if G 
is large, while it is reduced if G is small. 

Our mechanism thus assumes that nodes may forward a broadcast message to 
its initiator node. One might think that this incurs useless message transmission 
and should be avoided. We think, however, that this assumption can be justified 
for the following reason. In some algorithms, like Ipbcast [2], nodes combine 
different broadcast messages into one message. In that case nodes may receive 
messages that they have initiated, and thus no overhead is introduced. 

We remark that this concept of our mechanism is somewhat inspired by bio- 
logical functions of living organisms, where no explicit centralized control exists 
but local activities of subsystems are adjusted globally in a loose manner. Gly- 
colytic metabolism is such an example. Phosphofructokinase plays a key role in 
the regulation of the glycolytic pathway, but the activity of this enzyme for fruc- 
tose 6-phosphate is controlled loosely by the concentration of ATP (adenosine 
triphosphate); that is, the activity is lowered by high levels of ATP [15]. This 
phenomenon can be viewed as a mechanism for maintaining the total quantity of 
ATP to an appropriate level. In the proposed adjustment mechanism, the value 
of G plays a similar role as ATP does in this example. 

3.2 Adjustment Algorithm 

There are many ways to implement the idea mentioned above. Here we present 
a simple algorithm for illustration purposes. This algorithm is a modification of 
the one shown in Fig. 2. 

As stated above, the purpose of our mechanism is to adjust the fanout fi of 
each node i to changes in the system. In order to control the fanout value accu- 
rately, we design the algorithm so that fi can have a continuous value. Specif- 
ically, in the algorithm a node i forwards a broadcast message to at least [fi\ 
nodes. In addition node i forwards the message to another node with probability 
fi — [fi\ . This results in fi receivers on average. 

Figure 3 shows the algorithm which incorporates the proposed mechanism. 
In the algorithm two new procedures are used for updating the two metrics and 
adjusting the fanout based on them. 

Suppose that T is a time value such that all messages containing a broadcast 
initiated at time t were dropped by time t + T with sufficiently large probability. 
If such T is given, the value of G can be updated when T has elapsed after a new 




An Adaptive Mechanism for Epidemic Communication 311 



initiate broadcast of m: 

send (m,TTLinit) to [/ij nodes randomly chosen from viewi; 
choose another node in vieWi with probability fi — [/ij and 
send (m,TTLinit) to that node; 
countm ;= 0; 

at time T after initiating m: 

I := countm’, 
fi := adjust 2 {l, G); 

upon receiving broadcast message {m,TTL): 
if m has been received for the first time 
TTL ■- TTL - 1; 
countm := 1; 
if TTL > 0 

send {m,TTL) to fi nodes randomly chosen from vieWi’, 
choose another node in vieWi with probability fi — [fi\ and 
send {m,TTL) to that node; 
else 

countm ■= countm + 1; 

at time T after m that was initiated by another node was received for 
the first time: 

add countm to FIFO queue Q of length fc; 
fi ’.= adjusti{G)’, 



Fig. 3. Proposed adaptive algorithm. 



broadcast message was received for the first time. Similarly I can be updated 
when T has elapsed after a new broadcast was initiated. In practical situations 
one could set T to (average message delay). Adjustment of fanout is 

performed when these metrics are updated. To compute G, the number of times 
of message reception needs to be averaged over the latest k broadcast messages. 
For this purpose the queue Q of length k is used in Fig. 3. 

Now the remaining problem is the design of the two sub-procedures, 
adjusti{G) and adjust 2 {l,G), which are used to determine a new value of fi. 
Again, for illustration purposes, we present simple examples in this paper. These 
procedures first compute the change A in fi based on the parameters and add 
it to f^. 



adjustx(G) : When G has been updated, a node computes A as follows. 



Z\ := aide - G) 
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where a{> 0) is a constant parameter. Clearly Z\ > 0 if G < Oq, while Z\ < 0 if 
G > 6a- ^ takes 0 iff = G. 

adjust2{l, G) : When I is updated, A is computed subsequently as follows. 

* 1 + ^ 

* 1 + e-y(G-ea) ^ ^ 

where /3(> 0) and y(> 0) are constant parameters. The first half of the right part 
of the above substitution signifies that fi is increased if ? < 0;; it is decreased 
otherwise. The last half, which always has a positive value ranging from 0 to /3, 
represents the effect of G. If I < 6i, the value of this part decreases with the 
value of G, which means that the incremental change in fi is reduced by a high 
value of G. If I > 6i, on the other hand, it increases with G; that is, the decrease 
is amplified by a high value of G. In both cases, it equals l3/2 iff G = da- 

a, j3, and 7, as well as k, 9i, da, T, and TTLinit are configuration parameters 
of the algorithm. Although here we do not discuss the selection of values for these 
parameters, it should be addressed in future work. 

Once A has been computed, fi is adjusted as follows. 

max{/i + A, frmn} < 0 

min{/i + A , fmax\ Z\ ^ 0 

where f-max and fmin are specified upper and lower bounds on fi- 

A reasonable value for fmax is \viewi\, because fi cannot be larger than the 
view size. Also da is a reasonable choice for /^m since fi needs to be at least 
da in order for all nodes to receive da multiple copies of a broadcast message. 

4 Experimental Results 

This section presents the results of an experiment. In this experiment we assumed 
a simple model in order to obtain preliminary results that demonstrate the fun- 
damental behavior of the proposed algorithm. The model makes the following 
assumptions: 

— Each node initiates broadcast at the same rate. 

— The time gap between two consecutive broadcast initializations is sufficiently 
large compared to the time required for the spread of a broadcast. 

— There are upper and lower bounds on the message delay between two nodes, 
and the difference between these two bounds is negligible compared to the 
delay itself. Thus: 
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• The spread of messages proceeds in synchronous rounds. 

• The appropriate value of T can be determined. 

In general these assumptions do not hold in reality; but we think that this model 
suffices for our illustrative purpose. 

Here we show a typical run of the proposed algorithm in the presence of 
perturbations. We considered a system composed of 200 nodes and assumed that 
broadcast was initiated 1000 times in this run. As in [2,12], the membership view 
viewi of each node i was composed of the same number of randomly chosen 
nodes. In this experiment \viewi\ was set to 16. fmax and /mm were set to 
\viewi\{= 16) and 9g, respectively, for the reason stated in the previous section. 
TTLinit was set to 7 since many Gnutella clones, which are well-known P2P 
applications and adopt a conceptually similar communication mechanism, use 
this value. The other constant parameters were set as follows: a = 0.01, (3 = 1.0, 
7 = 2.0, and k = 10. The initial value of fi was set to 7, while 9g and 9i were 
both set to 4. 

Two kinds of perturbations were assumed to occur as follows. 

— 60 percent of the nodes failed simultaneously. 

~ After the first perturbation, 50 percent of the failed nodes were recovered 

and re-entered the system simultaneously. 

Figure 4 shows how the algorithm behaved in this situation. For comparison 
purposes, we also show in Fig. 5 the behavior of the algorithm shown in Fig. 2. 
The fanout / was set to 7 for this algorithm. In these graphs, the x-axis denotes 
the time. The y-axis shows the number of nodes that a broadcast reached; that 
is, it represents the reliability of broadcast. For readability the x-axis is scaled 
so that the time between two consecutive broadcast events is one time unit. 

As can be seen in these figures, both algorithms achieve high reliability before 
the first perturbation and after the second perturbation. In contrast, the non- 
adaptive algorithm shows relatively low reliability during the period between the 
two perturbations. The proposed algorithm, on the other hand, seems to have 
adapted well to the first perturbation too. Although it exhibited lower reliability 
just after the occurrence of this perturbation, the reliability increased gradually 
as the time advanced. 

Figure 6 shows how the fanout value changed dynamically in this run. In 
this graph the solid curve represents the average of the fanout value over non- 
failed nodes. The dotted line represents the fanout value for the non-adaptive 
algorithm, which was fixed to 7. As shown in this figure, the proposed algorithm 
adapts well to the changes in the system by changing the fanout value accord- 
ingly; if nodes failed, then the fanout value dynamically increased to keep the 
required level of reliability. On the other hand, fanout decreased when repaired 
nodes re-entered the system. Note that the value represented by this curve is av- 
eraged over all non-failed nodes. This explains the notch in the graph at around 
time 350 when many nodes failed simultaneously. 
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Fig. 4. Typical run of the adaptive algorithm. 




Fig. 5. Run of the non-adaptive algorithm shown in Fig. 2 (/ = 7). 



5 Conclusions 

In this paper we have presented an adaptive mechanism for epidemic commu- 
nication. This mechanism allows each node to adjust its fanout value to the 
changes in the system, with the aim of increased resiliency to perturbations. In 
the proposed mechanism, each node counts the number of times each broadcast 
message has been received. An averaged number over recent broadcast messages 
is used to estimate the status of the global system, while the number for the 




An Adaptive Mechanism for Epidemic Communication 315 




100 200 300 400 500 600 700 800 900 1000 

Fig. 6. Fanout values. 



latest broadcast initiated by each node is used to estimate the effect that that 
node has. By using two metrics the mechanism adjusts the fanout of each node 
according to the changes in system conditions. This mechanism has its analogy 
in biological functions, like metabolism, of living organisms, each constituent 
part behaves mainly independently but is controlled indirectly by the whole sys- 
tem. Using a simple model, we obtained preliminary experimental results. The 
results show that the adaptive mechanism can adjust well to perturbations, such 
as simultaneous death of many nodes. 
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Abstract. This work is part of the Blob computing project whose goal 
is to develop a new model of parallel machine including a new model 
of computation and a new machine. The whole project idea is to try 
to capture basic principles of bio-computing system allowing massive 
parallelism. The model of computation is based on the concept of self- 
developing network of compute nodes, the machine is a 2-D cellular au- 
tomaton grid whose evolution rule is fixed and implemented by simplified 
physical laws. A machine configuration represents idealized physical ob- 
jects such as membrane or particle gas. A central object called blob is the 
harware image of a compute node. Based on published formal proof, this 
paper presents first an implementation of the blob object using the “pro- 
grammable matter” platform of Cellular Automaton simulation. Then it 
describes an implementation of Blob division, the machine implementa- 
tion of compute node duplication. We used five different kinds of cellular 
automaton rules, all explained in separate boxes. The result obtained can 
be classified as a new specific form of self-reproducing cellular automa- 
ton. Unlike past examples of self-reproduction, it happens in parallel, 
since the number of time steps necessary is proportional to \/\p), where 
p measures the information (number of bits) contained in the object to 
duplicate. 



1 Introduction: The Blob Computing Project 

This work lies within the scope of the Blob Gomputing project [1]. Blob Gomput- 
ing, introduced into “Blob Gomputing” http://blob.lri.fr. The Blob Gomputing 
Project aims to develop a new model of parallelism encompassing a programing 
language cellular encoding that codes the development of a graph of cells; and 
a machine -the blob machine, with a state that represents such a graph and 
instructions that are able to develop it together with a machine language. Both 
the language and the machine are inspired from the biological developmental 
process. They try to identify and implement basic principles that underly the 
massive parallelism of bio-computing systems. 
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1.1 The Programming Language 

Cellular encoding codes the development of a graph of cells, starting from a single 
ancestor cell applying cell division instructions. Each cell has a copy of the code 
and reads instructions at a specific location using a reading head. Apart from 
dividing, cells can also exchange signals and perform computation on these signal 
using finite memory. For example a cell can simulate a simplified neuron, thus 
a cellular code can develop and simulate an ANN (Artificial Neural Network). 
Cellular encoding was designed in 1991 for use in genetic evolution of ANN. The 
goal was to automatically discover architecture with regularity matching the 
problem at hand. This was demonstrated on several difficult problems [2]. Cells 
can also merge, therefore the size of the cell graph can increase and decrease, 
depending on the requirements for memory and parallelism. Using this feature, 
a compiler has been designed for compilation from Pascal to cellular code [3] . It 
keeps the size of the code constant up to a small factor. This shows that cellular 
encoding is as expressive as Pascal for specifying general purpose computation 
problems. It is of particular interest, given the fact that there is no arbitrary 
large memory as in the traditional Von Neumann model. As soon as an array 
or whatever large data structure is needed, we just divide cells to create extra 
memory and memory is thus not inert. 



1.2 The Machine Language 

Between 1996 and 2002, we looked for seven simple primitive summerizing cel- 
lular encoding simple enough to be implemented on a machine : {Send, Receive, 
Copy node. Copy link. Delete link. Set link polarity}. A first presentation can 
be found in [1]. Here, a node, or compute node, is a highly simplified version 
of the cells used in cellular encoding. It is basically a real machine, i.e. a Finite 
State Automaton with an output function delivering at each time step one of the 
seven machine instructions. The first two instructions - send and receive, allow to 
build a traditionnal network of automata, with fixed architectures. The remain- 
ing five instructions allow the network of automata to “self develop” by adding 
or removing nodes or links. The resulting machine language is implicitly parallel, 
because once a compute node is copied, both copies can continue to execute in 
parallel. Thus, starting from only one compute node, one can obtain after some 
instructions a network realizing parallel operations. The major interest of this 
model is potential parallelism, a priori not limited, brought by self-development. 
The number of compute nodes can be increased by simple instructions, the nodes 
being able to continue to work in parallel. Figure 1 indicates how to program the 
sorting of a list using this model. The first compute node receives the first value 
to sort. Then it is duplicated to make it possible to store the value which will 
be received at the next step, in another compute node. The second value is then 
received by the first compute node, which transmits it to the higher node just 
created if this value is higher than the one in store, or stores it and transmits 
its former stored value to the higher node if the received value is lower than the 
stored value. We see that compute nodes are added during execution to store 
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Fig. 1. Sort program 

and sort the new values. The method is the same at each stage, and when the 
last value is received, the list is already sorted as we see at the end of Figure 1. 

1.3 The Machine 

This is the focus of our current research. To obtain a parallel machine, we must 
solve the mapping problem : which processors will simulate which compute node. 
The mapping must minimize communications by placing neighboring compute 
node on nearby processors. The mapping is the central problem of parallelism, 
it is NP hard and thus is addressed using heuristics. We found our idea for the 
machine by looking at how nature solves this problem. Nature is able to store 
a few billion neurons and their 100 000 billions connections into the 2000 cm^ 
of our skull, that is certainly a quite astonishing placement achievement. How 
does it work? During cell development, the movement of cells is influenced by 
chemical gradient, electrostatic force that may be genetically programmed. How- 
ever, behind this, if we go to the hardware mechanism, the first basic law that 
determines cell position is the everpresent law of physics : pressure and elastic 
membrane force exerted between cells. A machine trying to reproduce this place- 
ment principle can be summarize in the following four ideas : 

- The machine is a fine-grain simulation of a physical model, like the one used 
for weather predictions or fluid dynamics. Fine-grain means its structure can be 
as elementary as a cellular automaton. 

- Each compute node and each edge of the cell graph is attributed hardware 
support consisting of a connected component of the automata network, as if it 
were a physical object being simulated as such. 

- This support is allowed to change though time, with as much freedom as pos- 
sible, as if the physical object was moving through the space of the network 
(hardware-free representation). 
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- This freedom is exploited to optimize the placement of the soft graph on the 
hard network, continuously, during the development of the soft graph. For ex- 
ample, communicating compute node must be placed nearby. This optimization 
can be handled by the machine itself simulating laws of pressure and elasticity. 

1.4 The Contribution to the Machine 

In [4], the blob rule is introduced as a theoretical framework to represent 
hardware-free data-structures. A blob rule is a non-deterministic local rule on a 
hardware network (we will use a 2D grid) of arbitrary topology that guarantees 
the preservation of a global property of the network. This property states that 
the node’s color evolve in time in such a way that connected colored components 
can be identified through time. They move like color blobs, hence the term blob. 
Not only is the number of color blobs preserved but also topological relationships 
between blobs, such as adjacency, or encapsulation. We proved in [4] that it is 
possible to represent a graph date-structure using two kinds of blobs : red blobs 
for nodes, and green blobs for links, and by keeping the adjacency relationship 
between red blob and green blob, each green blob being adjacent to two red 
blobs. One can use such a Blob system to represent a network of compute nodes 
in a hardware free way. We identify two important problems to solve in order to 
achieve such a representation : 

1 - The code and memory of a compute node should also be represented inside 
its supporting blob and the representation should be also hardware free. 

2 - In order to develop the network, whether by adding new links, or new com- 
pute nodes, the basic operation that needs to be one is blob division, where 
one blob gives birth to two blobs with identical code and memory. This paper 
proposes a method to solve both problems, in the particular case where the 
hardware network is a 2D grid, and the update is synchronous, i.e. using an 2D 
cellular automaton. The Blob, implemented version of the compute node, is thus 
represented on the 2-D grid of cellular automata by an area gathering several 
automata. Its representation, similar to a spot of color, if one associates a color 
to each state of the automata of the grid, gave it its name drawn from English 
“Blob” (Figure 2). The solution that we advocate for problem 1, is to represent 
the code and memory by a gas of particles. Boxes of the grid are within this area 
in a particular state, coding the presence of the particle state is associated to a 
darker color. Each Blob can contain several hundreds of particles. Each particle 
contains an elementary data and an operator. The “composition” of the Blob 
itself quickly suggests living cells. Indeed, the Blob is a moving set of automata 
of the grid, maintained related by a rule which makes it possible to reproduce 
the effect of a membrane without having to represent it. Moreover, the Blob con- 
tains particles which carry the information and the code. As for the biological 
cells, they also maintain a connexity (cohesion) by the mean of a membrane and 
they contain in their centre the cell genome distributed on the various chromo- 
somes. The objectives of the Blob division being relatively similar to those of the 
division of animal cells in biology, we studied the possibility of imitating some 
of these mechanisms which seemed adapted to our needs within the framework 
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Fig. 2. Schematic implementation of a compute node on a 2D grid of automata 

of Blob division. The suggested model carries out the duplication of compute 
nodes in a way similar to the biological cellular division. 

2 State of the Art 

Although the Blob division rises directly from the need to specificy mechanisms 
for implementing the Blob machine, and directly meets the need to implement a 
node copy. We believe it can be of interest for the broader community. First, it 
is essential to locate the Blob Computing paradigm itself in the context of the 
new paradigms. Lately, there has been a consequent number of new models of 
computer architectures proposed, deviating from the traditional “Von Neumann” 
architecture. The Blob project is one of them, which one can currently regard as 
very distant from the current processors’ paradigms. Second, the Blob division is 
basically a mechanism of self-reproduction on cellular automata, since from only 
one Blob we wish to obtain two identical copies. The self-reproduction is not a 
new and unknown field. Finally, we found it interesting to make an incursion 
into the field of Biology, because once we determined the features of this Blob 
division, we realized the similarity with the operations carried out at the time of 
the mitosis in living cells. It was thus apropriate to study what had been made 
about simulation in biology. 

2.1 Other Paradigms of Massively Parallel Systems 

Systems including a machine approach. DeHon (Effectiveness) [5] consid- 
ers concrete applications of a massively parallel system, based on dynamically 
reprogrammable FPGA (DPGA). DeHon’s work also includes a language, but 
its scope of application seems centered on regular computation found in signal 
processing. 

- Amorphous computing [6] studies how to programm architectures : quasi reg- 
ular and quasi synchronous, nano-architecture resistant to breakdowns. Because 
such architectures are even more scalable than 2D automaton, we are thinking 
of implementing Blobs on it as well. 
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Systems centered on language aspects. The Gamma project [7] (chemical 
machine), defines a complete language exploiting a semantics of parallelism, but 
does not consider the machine aspect implied by the chemical metaphor. Paun, 
[8] [9], focuses on formal language aspects of “membrane system” but without 
seeking to concretize nor to work on the efficiency of a true machine. 

The principal interest of the Blob Computing project compared to these ap- 
proaches is that it includes at the same time a concept of parallel language not 
limited to regular computation(network of self-developing computation) and a 
concept of scalable and effective machine (cellular automaton and “Blob”). The 
Blob project is markedly biologically inspired, but tries to retain from biology 
only what is relevant from a computational point of view. 

2.2 Self-Reproduction on a 2-D Grid of Cellular Automata 

All the self-reproductive models presented until now only tend to colonize the 
space of a 2-D grid of automata in an iterative way. Once copied, the duplicate 
remains perfectly motionless. Each following model present peculiar specificities 
but they do not answer in any way the needs that we have in Blob Computing. 
At the time of the Blob division, the essential virtue requested from this division 
operation is the ability of self-placement of the duplicata. Moreover, all suggested 
solutions of reproduction are done in sequential, i.e. in time 0(n) where n is the 
size of the object to be duplicated, whereas we seek a time proportional to the 
diameter of the object to be duplicated, that is \/{n). The precursory book of 
Von Neumann [10] on the self-reproduction presents a complex model aiming 
to replicate a machine whole by itself. Langton [11] presents in his article a 
very simplified mechanism making a loop made up of unit elements but of fixed 
size reproduce, the loop reproducing in its neighborhood and this until it has 
completely colonized the space. 

The research carried out by Tempesti directly follows Langton’s research on its 
self-reproductive loops while giving them a possibility to vary in size, and a more 
effective programming. Our model of reproduction significantly departs from the 
preceding : we duplicate the code and memory inside, i.e. all the particles - and 
create 2 blobs, each with one set of particles [12]. 

2.3 Biology 

In the article [13] resuming Potts’ work on the migration of cancerous cells 
one finds the first steps of the Blob concept : conservation of connexity, self- 
placement and movements of cells. But most algorithms described in this paper 
require global operations on all cells of the grid (for example one supposes that 
in each point of all the simulated cell, the total cell volume is known), which is 
completely proscribed in the cellular automaton, because of local connection. We 
could not save usable elements for our project, except the way of implementing 
the surface tension phenomenon which indeed used a purely local rule [14]. 
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3 Incursion in Biology 

The principal source of inspiration of this Blob division came to us from the 
mitosis of living cells. Indeed, we noticed that they met all the criteria that we 
expected from the division of Blob. This mitosis is indeed able to duplicate the 
whole genetic material in parallel, to separate both copies of this duplicated ma- 
terial and to form, starting from a mother cell, two new daughter cells containing 
a copy of each chromosome. 

The cellular division is made up of stages which the biologists distinguish with 




Fig. 3. Three stages of the cellular division: Rest, Anaphase, Telophase 



precision. The genetic material of the cells is carried by the chromosomes. In 
order to ensure that each of the two future cells resulting from division pos- 
sesses a complete copy of the genome, one of the first stages of the mitosis : the 
anaphase, consists in duplicating identically each chromosome by a replication 
process. Once these chromosomes are duplicated, a process based on filaments 
separates the two copies of the chromosomes in the direction of each pole. When 
each copy of the chromosomes has reached the pole division is finalized. Then an 
operation which is not found in all living species takes place : cytokinesis during 
the telophase, which causes a constriction of the cell at its equator until the cell 
separates in two distinct cells, (Figure 3). 

The mitosis inspired us in implementing the duplication of compute nodes with 
the Blob division. In this manner, we obtain a machine where the network of 
compute nodes is placed and organized. Indeed, the Blob is an object which has 
a certain autonomy, i.e. its physical support, meaning all automata of the grid 
which make it up, change after each iteration of the cellular automaton grid. In 
a visible way. Blobs move on the grid, contrary to the previously quoted self- 
reproductive models. It is this mobility of Blob which is the essential property 
making it possible for a Blob Computing system as a whole to be automatically 
distributed on the grid of automata, in a homogeneous way and while minimizing 
the communications. 
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4 Implementation 

4.1 Complexity in a Time Proportional to the Diameter 

First of all, it is important to insist on the local character of the communications 
between boxes (automata) of the grid, in order to forward information between 
two different boxes from the grid, this information must be relayed by all the 
intermediate boxes located between the two boxes wanting to communicate. The 
communication time being directly related to the distance between the source 
and the target, it will be essential to optimize this distance. Thus a first capital 
property can be deduced from this : the minimal time necessary to broadcast a 
signal (such as “start copy node”) to all the boxes forming part of a Blob is pro- 
portional to the diameter of the blob. This is because information is transmitted 
from neighbor to neighbor, therefore from box to box. For example, the minimal 
time required by an information emitted by the box located nearest to the center 
of a Blob to reach all the boxes located at its edge is roughly half the diameter of 
this Blob. To obtain a Blob division algorithm whose time complexity would be 
proportional to the diameter is the best that can be done in this context, and it 
is our goal. Although this communication scheme may appear very slow, it does 
not prevent us from obtaining optimal complexity results in simulation, for such 
general purpose algorithms as sorting and matrix multiplication. Here, optimal 
should be understood with respect to the VLSI complexity. Of course long range 
connection could considerably improve performance by a constant factor. 



4.2 The SIMP Simulation Platform 

No implementation of Blob had been made previously, we thus had to 
choose an adapted platform of simulation and to implement Blob even be- 
fore thinking of working on its division. The simulation platform selected was 
SIMP/Programable Matter http://pm.bu.edu, developed by Ted Bach and Tom- 
maso Toffoli [15]. SIMP makes it possible to simulate 2-D grids of cellular au- 
tomata and offers a graphic display, by associating to each state of the automata 
a specific color. Most of the illustrations of this document were just snapshots 
of this display. Simulation in SIMP, is made very fast by pre-compiling Look Up 
Table, and by separating communication from computation. SIMP simulation is 
a synchronous cellular automata, harder to program but much closer to a real 
machine than an asynchronous cellular automata. 
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4.3 Blob Implementation 

As previously indicated, no Blob implementation had been made before. It is 
thus where we began. A Blob is a surface of the 2D grid of automata, but this 
Blob having to be moving to take part in the self-placement of the compute 
network, how do we guarantee the cohesion and the connexity of all the boxes 
belonging to a Blob ? Indeed, if no rules ensure this cohesion, after any dis- 
placement, the Blob “will leave in scraps”. See insert “Blob rule” The Blob rule 
[4] ensures this cohesion. It imitates the effect of a membrane around the Blob, 
without having to implement one. Much more important, this Blob rule is com- 
posed only by local communications. It is thus this rule first, whose correction 
was proven [4], that we implemented. The Blob rule alone does not specify an 
evolution, but a whole set of possible evolutions. It prohibits some transitions 
and authorizes others. To obtain an evolution, one of the authorized transitions 
is randomly elected, while taking care to preserve mutual exclusion, i.e. two ad- 
jacent automata cannot execute transitions simultaneously. It is by introducting 
chance that the impression of a moving blob is given, http://blob.lri.fr Movie 
rubric. The implemented Blob rule is enough to maintain the cohesion of the au- 
tomata boxes of the grid composing the Blob, but the form which this premise 
of Blob takes does not approach the disc by itself, but rather the shape of a fila- 
mentous tree structure. However, it is towards the disc that we want the surface 
of a Blob to tend, this in order to limit its diameter, thus also limiting executing 
time as previously indicated in the first section of this “Implementation” part. 
We associated to this Blob rule a smoothing rule which reproduces the surface 
tension that one finds in physics [13,14]. This rule prevents the outgrowths of 
the surface of this Blob by a rule of majority. See insert 2 : Surface Tension 
Rule by Smoothing. Once this rule is implemented, the obtained Blob has a form 
which indeed approaches the disc. 

We then introduced particles within this Blob, as the model indicates, to store 
the program code, and data. The choice to give a pseudo-Brownian movement 
to the particles has proved undispensable to give a dynamic to the Blob : it 
ensures a homogeneous distribution of the particles by simulating the pressure, 
as a result, the size of a Blob rapidly increases after addition of particles, and 
decreases after destruction of particles. Also the movement of the whole Blob is 
induced by giving to the particles an impulsion in a specific direction. In general 
the pression is the key ingredient for the automatic placement of Blob in space. 
This movement of each particle, similar to that of the particles of an ideal gas 
was carried out by the coding of the “blob rule” named HPP of T.Toffoli [14]. The 
particles have an effect on the probability to augment or diminish the Blob size. 

1) A square containing a particle cannot be suppressed from a Blob, otherwise 
the particle would find itself outside the Blob. 

2) The probability to increase the Blob is greater if the number of particles near 
the boarder is greater. If no particles are present in the neighborhood - this 
probability is set to zero, and increase is completely forbidden. 
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Blob Rule 




The Blob rule [4] is a rule applied to each cycle for each box of the grid. It makes it possible to 
determine if a box forming part of the Blob has the possibility of becoming a box not forming 
part of it anymore, and vice versa, to decide if a box not forming part of the Blob has the 
possibility of becoming a box of the Blob. This by an analysis of the direct neihborhood of 
the concerned box, which makes it possible to check that the connexity will not be modified 
locally after the addition or the removal of this box in the Blob. On the Figure above where 
two Blobs are side by side, one studies the case of three particular boxes. The Blob rule 
consists in counting the number of Blob components in the neighborhood of each one of 
these boxes. The neighborhood includes the eight boxes situated around the studied box, 
excluding the studied box itself. As it is seen the top box as well as the bottom box have only 
one related component of Blob in their neighborhood, one can then authorize their change 
of state because the Blob connexity will be preserved locally (hence globally). On the other 
hand one counts two related Blob components around the medium box, one thus prohibits 
its change of state, and indeed, if the change had been carried out, both Blobs would have 
been bound and would have become one. Thus the Blob rule preserves the cohesion and 
the connexity of Blob as a whole by the simple application of a local rule which needs to 
communicate only with the eight neighbors of each box. 



Surface Tension Rule by Smoothing 




The smoothing rule makes it possible to reproduce the effect of surface tension, which helps 
Blob to take a form near to the disc. This rule rests on a simple strategy. It refines the Blob 
rule by adjusting the probability of transitions already validated by the Blob rule. For all the 
boxes whose change of state was authorized by the Blob rule, the smoothing rule supports 
the changes of state which tend to give a round form and disadvantages the changes which 
would tend to stretch or deform the Blob. 

This rule uses the same neighborhood as the Blob rule (eight neighbors), and counts among 
these neighbors, how many are part of Blob, and how many are not. If for example the required 
change is a box which wants to integrate Blob and that five of the eight neighbors, are already 
part of Blob, the change of state will be largely favoured. If on the other hand a box wanting 
to join the Blob has only one box already belonging to the Blob in its neighborhood, the 
change will be completely prohibited, this allowing to prevent the outgrowths which would 
draw the Blob away from its optimal disc shape. 
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HPP blob rule 




Rule HPP belongs to the family of the lattice Gas rules [14], that reproduce the 
particles displacements of a perfect gas on a 2-D grid of cellular automata. This rule 
has the very interesting feature to avoid storing a direction and a momentum for each 
gas particle. Instead, this direction and this movement are quite simply coded by the 
position of the particle on the grid. The particles included in Blob follow this rule 
which allows considerable saving in automaton states. 

This rule divides again the 2-D grid of cellular automata in a meta-grid which in 
each one of its meta-boxes includes four boxes of the grid of cellular automata. As 
we see it on the Figure, the particles in the automat located top left of the meta-box 
always move towards the bottom right box, the particles bottom right always move 
to the top left, those which are bottom left go to top right and finally those top right 
move towards the bottom left corner. Once the displacement is carried out, the grid 
itself moves one box of the grid of cellular automata horizontally and also one box 
vertically, which is a diagonal movement. Thus one notices that a particle located for 
example initially in a top right box, being therefore moved to bottom left, is found, 
after displacement of the grid, again in a top right box, like initially. Its motion will 
thus continue to the bottom left. The direction and the momentum are thus preserved 
throughout movement of the particles. 

A specihc case is nevertheless considered, it is the case of collision between two par- 
ticles, or of a particle against the border of Blob. In this case instead of carrying out 
its movement diagonally which is impossible for it, the particle in collision carries out 
a movement of vertical or horizontal displacement according to the free sites. (In the 
case of a collision with the border of the Blob the surface of Blob will tend to extend 
thanks to the increase in probability of extending in the area of collision, this is the 
algorithm that adapts the blob size according to the pressure.) Thus at the next step, 
the direction of the move of the particle will not be the same, due to its new position 
in the meta-grid. In this way, blocking situations are avoided. The movement is thus 
preserved and achieves the desired dynamic for Blob. 



4.4 Theoretical Choices and Implementation of the Blob Division 

Before a Blob starts to divide, it is essential to inform the whole set of boxes 
automata composing it that this division begins. This starting division signal 
allows the particles contained in the Blob to duplicate like the chromosomes in 
Biology. Each copy of a particle will go in each of the two Blobs obtained after 
division. It was thus necessary to create a mechanism diffusing the instructions, 
this not being natural nor obvious because of the local character of the commu- 
nications on the 2-D grid, and the unbounded size of a Blob. This mechanism 
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is based on a gradual flood type communication which takes the visual aspect 
of a square wave propagation. We added and centered a differentiated particle : 
the master, that creates this wave. We centered the master for several reasons 
: first the wave can cross the Blob in a minimal time, most importantly at the 
time of the end of division the master particle needs to be centered to perform 
the final cut, which we will see thereafter. The centering of this particle was 
obtained using a wave reflected from the edge of Blob, retracting until reaching 
the center, pushing the master to the center at the same time. 



Diffusion of information by wave 




m 



The two wavefronts mentioned earlier are quite simply local rules aiming at broad- 
casting information at the global level. The wave consists quite simply of a specific 
state of the automat which is transmitted gradually. On the figure, the wave ( forward 
as well as backward), is posted in clear gray, while all the other boxes which make 
the Blob but which do not have the information of the wave are posted in dark gray. 
The rule of diffusion for the forward wave is a rule by flood : any box in contact with 
another box already possessing the information of the forward wave, acquires this 
information. The forward wave reaches the ends of Blob in a time equal to half its 
diameter. 

The backward wave uses a more subtle mechanism. First, the backward wave is gen- 
erated by the forward wave. After entering in contact with the periphery of Blob 
the forward wave is “reflected” into a backward wave. Second, this backward wave 
does not propagate as the forward wave. The backward wave follows the Blob rule 
presented before, with the only difference that when it meets particles it is not not 
hindered, contrary to the retraction of the surface of Blob itself. Thus the backward 
wave acts as a Blob whose particles would all have suddenly vanished, removing any 
pressure which the particles caused on it, which results in a complete retraction of 
this Blob. The backward wave thus retracts and at the same time, pushes the master 
particle to the center of the Blob. When this backward wave has completely retracted, 
a new forward wave is emitted and the forward wave, backward wave process cycles 
indefinitely, like a clock whose cycle duration is proportional to the blob diameter. 

The first passage of the forward wave causes a duplication of each particle, 
like the replication of the chromosomes. Thus, after this passage of the wave 
each particle contained in the Blob is found in two specimens, each specimen 
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being differently “polarized” thanks to a specific state of the automaton. As 
soon as the particles are polarized, the + particles start to migrate towards the 
+ pole thanks to the rule of cellular gravity [16] while the — particles migrate 
towards the — pole by this same rule. Currently the axis of the + and — poles is 
arbitrarily fixed on the up/down axis of our grid, but in an immediate future this 
axis will be adjusted according to the links connecting a Blob to its neighbors, 
the inheritance passed on to the child blob. 

The migration of the particles in two opposite directions causes the creation of a 




Fig. 4. Polarity action on particles: Separation at the poles 



zone of vacuum in the middle of the Blob, which causes a depression and thus its 
retraction in its equatorial zone, making possible the end of division. It reflects 
the biological phenomenon, called the cytokinesis that we already mentionned. 
The equator is reduced to the point that its diameter mesures just one box. The 
master particle can locally check that division is about to finish. It performs the 
final cut and is duplicated at the same time, each new master particle finding 
itself in one of the child Blobs. Division ends and a wave of depolarization is 
sent by the master in each Blob. A new division can begin again if necessary. 
We could finish this implementation, and the assumptions formulated at the 
beginning were confirmed experimentally. We measured performance, varying 
the number of particles included in Blob, and the results are conclusive : the 
time needed by division being indeed, proportional to the Blob diameter. 

5 Correctness and Results 

5.1 Correctness 

The correctness of the Blob division was not proven, but it is rather intuitive. It 
rests upon : The correctness of the Blob rule which has been proven. In addition 
it is clear that the special form that the Blob takes, two balls separated by a 1 
box thick channel, is a “reacheable” form, i.e. one can always “guide” the hazard 
to arrive at this form, there is a nonnull probability to arrive there. Last, because 
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Fig. 5. Graphic presentation of the division our program performs. Chronology left to 
right. 



it is centered, the master particle finds itself in the channel. The figure is indeed 
symmetrical, and the channel is thus in the middle. The master can locally check 
that it is in the channel. When it finds two related components of Blob in its 
neighborhood it can proceed to the final cut. It should be stressed that this 
situation for the master does not occur before division, owing to the fact that 
the master is centered in the Blob, and thus does not touch the edge of the Blob. 
It remains to check that when the master cuts, all the + particles are on one 
side and the — particles on the other. In our experiments we noted that it is a 98 
per cent probability, because the separation of the particles is performed more 
quickly than the formation of the channel. However it is possible to refine a little 
the wave rule and prove its hundred per cent correctness. For that it suffices that 
the backward wave collects on its passage the charge of the particles it meets. 
Thus the edge of that wave contains 2 bits of data corresponding to four distinct 
cases : 

- No particles were crossed. 

- Only + particles (resp. — ) were crossed. 

- Both + and — particles were crossed. 

When the master is wedged in the channel, it will have to wait before cutting 
until the wave which arrives to one of its side indicates that there are only + 
particles left there and that the wave on the other side indicates that there are 
only — particles left. 

5.2 Results 

On the Time graph figure one can note that the average time needed to carry 
out division is Oiy^p) where p is the number of particles. This result is the 
best possible and was our initial goal. It is simply explained owing to the 
fact that each stage of division takes a time proportional to the diameter of 
the blob, which itself is proportional to \/{p), since a Blob has a form close 
to the disc, and the density is constant (approximately 1/2, a box with a 
particle for an empty box). Let us make the list of these stages: -The wave 
propagation is done at constant speed. It carries out a round trip in a time 
proportional to the diameter. - In the article on Cellular Gravity, [16] was 
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Fig. 6. Time graph 



proven an upper bound to the time the cellular gravity rule takes to pile up 
a set of particles located in a square into a heap. A heap is defined if each 
particle “rests” on the lower edge of the rectangle or on three other particles. 
This upper bound is precisely three times the diameter of the square. - When 
one doubles the density of particles, a widening of the membrane result, the 
time taken for this widening is also proportional to the diameter. This is an 
experimental observation but can be justified : The number of boxes of which 
the Blob increases at each time step is proportional to the diameter of the Blob, 
since it increases uniformly on all the border. The number of added squares 
being proportional to \/{p), it implies that the number of time steps neccessary 
to enlarge the blob by an amount proportional to p, will be proportional to \/lp) ■ 



6 Future Projects 

The blob division is a first step of the implementation. Much work is needed 
to simulate the complete machine. We planned to equip the Blob with a quite 
distinct membrane and to no longer use only the Blob rule to maintain the 
cohesion of the cells composing it. This should achieve a better control of 
the surface tension of the Blob, of the blob shape, the division process, and 
especially enable us to make come into contact several different Blobs, while 
not mixing their contents. 

In the current version, if the particles are polarized with the same charge, 
with the goal to make the blob move globally towards one direction, then the 
blob indeed moves. However two or more outgrowth may strech indefinitely 
out of it, after some point, without any possible recovery. We think the use of 
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a membrane should be able to control the blob diameter, and thus solve this 
problem. 

Currently, the division that we present carries out arbitrarily on a vertical axis, 
but as soon as we introduce this division into a more complete model of Blob 
Computing, the axis of division will be self organized according to the links 
between Blobs and of their polarity determining the inheritance to child blob. 
The selected axis will be that which passes by the point of greater + polarity 
and by the point of greater — polarity, this in order to self-direct the cutting 
plane to separate the -I- link from the — link. Another important point remains 
to be defined, it is the methods of junction of two Blobs, by specific link-Blob 
which comprises “links” particles belonging to both the link-Blob and to the 
Blob that it connects. This part of the model will allow to represent a network 
of Blobs. Obviously the greatest future project remains a Blob Computer. 
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Abstract. A method is presented to predict phase relationships be- 
tween coupled phase oscillators. As an illustration of how the method 
can be applied, a distribnted Central Pattern Generator (CPG) model 
based on amplitnde controlled phase oscillators is presented. Represen- 
tative results of numerical integration of the CPG model are presented 
to illnstrate its excellent properties in terms of transition speeds, robnst- 
ness and independence on initial conditions. A particnlarly interesting 
featnre of the CPG is the possibility to switch between different sta- 
ble gaits by varying a single parameter. These characteristics make the 
CPG model an interesting solution for the decentralized control of mnlti- 
legged robots. The approach is discussed in the more general framework 
of coupled nonlinear systems, and design tools for nonlinear distributed 
control schemes applicable to Information Technology and Robotics. 



1 Introduction 

Information Technology has seen an unprecedented growth in possibilities and 
capacity in the second half of the 20th century. Powerful theories have emerged 
along with engineering principles that turn these theories into successful real 
world applications. Almost all of this progress has been made by adopting a linear 
and sequential approach to analyze and design systems. Under this view, each of 
the subsystems must be carefully engineered, in order to make them as reliable 
as possible. When connecting them, one is striving for a linear interaction as 
this allows one to guarantee that the prediction made for interacting subsystems 
remains valid. The order of operation, tasks and information flow is usually 
sequential as this simplifies the understanding of the mode of operation of the 
system and the identification of possible problems. 

This is in contrast to how natural systems work. In nature, the subsystems are 
usually unreliable, non-uniform, noisy but in huge numbers. The subsystems and 
their interaction are of active, nonlinear nature, leading to emergent phenomena 
on the system level. Therefore, these systems often work naturally in a parallel 
fashion. This tends to give interesting properties to natural systems such as 
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robustness, fast computation, high energy efficiency and versatility despite slow, 
noisy and unreliable components. In order to be able to construct systems with 
similar properties, it is crucial to have adequate theoretical tools for modeling 
and designing these complex systems. 

In this article, we would like to contribute to this effort in the field of oscilla- 
tory systems. We develop a method for predicting phase relationships in systems 
of coupled oscillators, and use it to design systems that can switch between well- 
defined phase-locked states. In particular, we apply our approach to a concrete 
example: the distributed control of locomotion in robots with multiple degrees 
of freedom. 

2 Designing Biologically Inspired Distributed Controllers 
for Walking Robots 

Controlling walking in robots has proved to be a difficult engineering challenge. 
It requires coordinating multiple degrees of freedom using signals of the right 
frequencies, phases, and amplitude. As nature presents very robust and elegant 
solutions to that problem, some engineers have turned to biology as a source of 
inspiration. At first sight, the animal locomotory system seems to be of enormous 
complexity. But, despite the large number of elements taking part in locomotion 
control, a few simple common features have been observed by biologists among a 
large variety of different species. One of these is the notion of the Central Pattern 
Generator (CPC) [6,5]. A CPC is a network of neurons, capable of producing 
oscillatory signals without oscillatory inputs. For locomotion, CPGs are located 
in the spine, and receive relatively simple signals from higher centers of the brain 
for the control of speed and direction. Sensory feedback is usually not needed to 
produce the basic patterns, although it plays an important role in adapting the 
patterns to the given situation the animal is faced with. 

Another important concept is to classify different walking patterns by the 
phase relationships between the individual limbs. This method allows to un- 
cover striking similarities between the gait patterns observed in very different 
animals. In quadruped locomotion there are three gait patterns that are very 
often observed: walk, trot and bound. 

Models of different complexity and based on different assumptions have been 
devised that can produce the abstract gait patterns [1,19,3,21,20]. One important 
approach is - motivated by the oscillatory limb movements - to use the most 
simple mathematical model that produces stable oscillatory behavior as gait 
pattern generator for one limb. This mathematical model is a nonlinear oscillator 
of some form. This oscillators are then connected together in order to achieve 
inter-limb coordination (see [1,20,3]). 

Except [21,20] most previous models use nonlinear oscillators that are mo- 
tivated by neuronal circuits and that have therefore limit cycles with irregular 
shapes. In this contribution, the point is made to use the simplest oscillators 
possible as canonical subsystems, in order to have systems that are well under- 
stood and are simpler to treat analytically. The canonical subsystem is taken 
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out of the class of nonlinear oscillators. In this article, the canonical subsys- 
tem which serves as model for the pattern generator of a single limb will be 
a simple amplitude controlled phase oscillator (ACPO). By this choice of the 
canonical subsystem one does avoid the problems involved with the aforemen- 
tioned neuronal oscillators. The analytical treatment leads to a understanding 
of the system behavior that allows to apply synthetic approaches to construct 
a network with these canonical subsystems with desired global behavior. Fur- 
thermore, the network is constructed to have one single parameter by which the 
exhibited gait pattern can be controlled. This is a simplification comparing to 
previous approaches which usually need several parameters to be changed at the 
same time. 

The desired properties that our CFG model should exhibit are the following. 
First, the CFG model should be independent of initial conditions and robust 
against perturbations. Second, the expressed gaits should ideally be controlled 
by one simple control variable. This simplifies control, and also replicates the 
biological observation that the modulation of a simple electrical stimulation sig- 
nal is sufficient to change gait in cats [18]. Finally, when changing the control 
variable, the GPG should exhibit fast transitions, ideally within one cycle. The 
transitions are a critical moment since the animal can loose its stability if the 
transitions are not appropriate. Furthermore, fast transitions are also observed 
in nature. To the best of our knowledge, hitherto there exists no simple model 
that fulfills all the criteria just stated. 

3 Outline 

A short outline of our approach will be given as follows. First, the canonical 
subsystem will be presented. The notion of the phase will be introduced, since 
the phase is crucial to understand synchronization behavior. Then, it will be 
shown, that by examining the form of the limit cycle the sensitivity of the phase 
on perturbations can be derived. With this result, it will be shown how the phase 
relationship between two unidirectionally coupled oscillators can be derived. Out 
of the insights gained by that treatment, a method is presented to chose an 
arbitrary phase relationship between the two oscillators. 

Next, a quadruped walking controller composed of four coupled oscillators 
will be constructed. The additional couplings give raise to additional constraints 
on the phase relationships. It will be shown by numerical experiments that only 
phase relationships that fulfill these constraints are stable. In a next step, it will 
be shown how we can exploit these additional constraints to have an continuous 
valued parameter that allows us to chose the gait pattern expressed. In the 
discussion we show how the results presented in this article fit in the larger 
picture and show that our assumptions and simplifications are based on firm 
theoretical grounds. 
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4 A Distributed Quadruped Central Pattern Generator 

4.1 Predicting the Phase Between Two Oscillators 

Our goal in this section is to predict how an oscillator reacts to perturbations 
by looking at its limit cycle from a geometrical point of view, and to use this 
prediction for determining the phase relationship between coupled oscillators. 

To start with the concepts needed to discuss nonlinear oscillators, the notion 
of a perturbed nonlinear dynamical system is introduced: 

q=F(q) + p (1) 

where q is the vector of state variables and p a perturbation vector. In the case 
the unperturbed system (p = 0) converges to a periodic solution, it is called an 
oscillator and the set of q on which it continues to evolve is called the limit cycle 
of the system. As described in [15], every oscillator can be transformed into a 
phase {9) - radius (r) coordinate system: 

6 = ojq + p0 ( 2 ) 

f = ifr.(r, 6») +pr (3) 

where loq is the natural frequency of the (unperturbed) oscillator, Fr is the 
dynamical system describing the evolution of r, p 0 is the component of the 
perturbation acting on the phase and Pr is the component of the perturbation 
acting in direction of the radius. Perturbations on a stable limit cycle have 
different effects on the phase depending on the pg and Pr components. The p 0 
component will modify the phase, since the phase is marginally stable [15]. On 
the other hand, the Pr component, i.e. in the direction of the radius, will be 
damped out and will have little effect on the phase. 

When two oscillators {Fi, F 2 with corresponding state vectors qi,q 2 ) are 
coupled together = /(qi))> several types of dynamics can result including 
chaos (i.e. no periodic behavior) and phase-coupling. In this article, we are in- 
terested in 1:1 phase-locked regimes, i.e. when the oscillators synchronize such 
that [15] 



9d = 02 — 9i ^ const (4) 

Assuming that the system has phase-coupled^, we are now interested in how to 
predict 64 given two oscillators and their coupling. The general outline of our 
method is as follows: 

1. From the limit cycle of the perturbed oscillator a sensitivity function S'p(p) 
is derived. 

^ Determining which conditions are necessary for phase-conpling is out of the scope 
of the current article. 
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2. From the limit cycle of the perturbing system (the other oscillator), the 
coupling and the sensitivity function, the perturbation term pg is calculated. 
pg is usually a function of the phase difference and the phase of the perturbed 
system. 

3. From the requirement of phase synchronization (4) a differential equation 
(DE) for the phase difference between the perturbed and the perturbing 
system {9d) can be derived. This DE is usually a function of the phase 
difference and the phase of the perturbed system. 

4. By integrating pg over the evolution of the perturbed limit cycle the per- 
turbation of the phase that stays in the system is computed. This allows to 
derive a DE that only depends on 6d- By help of that DE, the fixed points 
for 6d can be found. 

5. By applying a stability analysis of the DE, the stable and unstable fixed 
points can be distinguished. 



When looking at the phase space representation of a nonlinear dynamical system 
we can conclude that changes in the derivative of phase can only stem from 
components of the perturbation that are in direction of 9, i.e. tangential to the 
limit cycle. The unit vector tangential to the limit cycle is 




( 5 ) 



Therefore, the effective perturbation on the phase is 

pg=p>-eg ( 6 ) 



The derivative of the phase becomes 

9 = LOo + p- eg 



( 7 ) 



So we found the sensitivity of the phase on perturbations: 



5p(p) 



IpI 





(8) 



With (7) we found an explicit form for the time evolution of 9. By using the 
definition in (4) we can derive a differential equation for 9d- We require synchro- 
nization after some transient phase which is not discussed here: 



0rfdt = 0 






(9) 



On the other hand 



9d = 92 — 9\ 



(10) 



= Wo.2 +PS2 - (W0,1 (11) 

This is usually a function of 9g and 02 • As we are mainly interested in the steady 
state of the system, we integrate it over time. The integration over time is done 
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implicitly by integration over 62 (which increases monotonically with time), after 
the system reached the steady phase locked state. We assume the criterion for 
phase locking to be fulfilled at 62 = 6>o, and that the system subsequently is 
in the steady phase locked state for 6*2 > 6*0 • From (9) we see that the integral 
should be zero 



pO c)o ^2n7T+0o 

Od,res = lim / 9dd92 = '^ 9dd92 = 0 (12) 

" J Qq •^2(n— 1)7 t+0o 

We now have outlined all the steps needed to arrive with a differential equation 
for 9d- In the following, we will show the analysis of the phase oscillator that 
will be used to construct the CPG. 



4.2 The Amplitude Controlled Phase Oscillator 

As outlined before, the CPG model will be constructed of simple canonical sub- 
systems. In this case the subsystems are an amplitude controlled phase oscillator 
(ACPO). The ACPO is defined by the following dynamical system: 

\9,fY' =[uj,-g{r -To)]^ (13) 



The description of this system can be transformed into an equivalent description 
in the Cartesian coordinate system {x = r cos 9,y = r sin 9) : 



-I T 



q = 



X 

y 



ro 



\f^ 



- 1 



x-yuj,g 



ro 



i/a?" 



- 1 



(14) 



A short hand notation of this system is introduced: q = FACPo(q)> where q = 
[x, t/]^ is the state vector of the system. This system shows a limit cycle that has 
the form of a perfect circle with radius rp (Fig. 1(a)). The intrinsic frequency of 
the oscillator is oj. 



4.3 Two Coupled ACPO 

We introduce now a system of two ACPO where one ACPO is coupled unidirec- 
tionally to the other one. 



qi = FACPo(qi) (15) 

q2 = FACPo(q2) + Pc(qi) (16) 

where qi = [xi,yi], <72 = [2:2, 2/2]- Next will be shown, how we can derive the 
phase relationship 9^ from the knowledge of the shape of the limit cycle and 
Pc. We will do this in an analytical way to illustrate how the method works. 
However, the method is not limited to cases where we know the form of the 
limit cycle by analytical derivation, but also works for cases where we get the 
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Fig. 1. a) Limit cycle of the amplitude controlled phase oscillator for ro = l,g = 
10, u! = 27r[rads“^]. The arrows show the flow q dehned by the Facpo (13). b) This 
figure shows the phase difference established for the following values of u>d = —0.0042, 
A = 2 and g = 1000. With help of (23) predicted value is 0d = 0.2493 (dashed line). The 
value from numerical integration is shown with the solid line (mean over t = [ 10 , 20 ] is 
Bd = 0.2554). c) The structure of the ACPO-CPG. Note that the connections illustrated 
by arrows involve rotation matrices (compare to text). 

form of the limit cycle and fc by numerical integration. To illustrate how the 
method works, consider the simple connection scheme: 

Pc = A[0,xi]^ (17) 

In words: State variable x from ACPO 1 is coupled on the derivative of state y 
of ACPO 2 with a coupling constant A. 

1. We derive 

= [-sin(6»2),cos(6»2)]'^ (18) 

1^2 1 

2 . 



Pc = A[0,a;i]^ = A[O,rcos(0i)]^ (19) 

From (6) we get 

P 02 = Ar[0, cos( 6 »i)]^ ■ [- sin( 6 » 2 ), cos( 6 l 2 )]^ ^ + cos(26l2 - Bd)] (20) 

3. Using (10) and (20) 

Od = Wo, 2 - wo,i + Ar^ [cos{9d) + cos(202 - 9d)] (21) 

4. From (12) and (21) we get 
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From this equation we can calculate the (averaged) fixed points for 9d 



9d 



= arccos 




(23) 



We note that we need |^)r| < 1 for this particular system to phase-lock (i.e. 
for (23) to have equilibrium points). Since we assume steady phase locked 
state, r « ro can be assumed. We are interested in the stable fixed points, 
since they determine to which phase relationship the system will evolve. For 
example, for ujd = 0, we find solutions at | -I- mr, n € Zq. 

5. The stability of the fixed points is determined by the one-dimensional Jaco- 
bian for 6 which can be obtained by differentiating (22) 



— = -Ar7rsin(6>d) (24) 

of’d 

From this equation we can calculate that = — Ar < 0 for f -I- 2mr 

and = Ar > 0 for ^ + 2n (for A > 0, opposite if A < 0). Therefore, for 
Wd = 0 only phase differences = f + 2n7r are stable fixed points. 

Using (23) we can therefore determine the phase difference to which the two 
oscillators evolve when coupled, under the assumption that they phase-lock. For 
Wd yf 0 the fixed points for 9d have slightly different values and are dependent on 
the choice of tq, as can be seen from (23). In Fig. 1(b) the results for numerical 
integration of the system treated above are presented for ujd ^ 0 and compared 
to the value predicted by the analytical treatment. 



4.4 Method for Choosing Arbitrary 9d 

Based on the insight gained in the previous section a method will be presented 
to chose arbitrary 9d- Therefore, a more general coupling scheme is introduced: 

P 2 = APqi (25) 

where P is the coupling matrix. In the aforementioned example (17) it would be 




We define a rotation matrix 

R = 



/ cos 9r - sin 9r 
sin 9r cos 9r 



(27) 



By taking qi^r = Rqi, we get a vector that is equivalent to the vector qi(0(), 
9[ = 9i + 9r- In other words, if we take qi^r to perturb the second oscillator the 
effect is the same as if the first oscillator would be in state 9[. Thus, 

P 2 = APqi_r = APRqi = Ar[0, cos 9i cos 9r — sin 9i sin 9r]^ 



(28) 
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Using the same approach as in (20)-(23) we get 

Sd,res = ^TTLOd + ArTT [cos 9d COS 9fi — sin 9d sin 9r] = 0 (29) 



By exploiting the trigonometric addition theorems this transforms into 



9d 



= arccos 

^d,res = 0 




Or 



(30) 



where again r « ro is the steady state behavior. As can be seen 9d is directly 
proportional to the rotation angle 9^. Using (30) we can design couplings be- 
tween the oscillators such as to obtain arbitrary phase difference between them. 
Note that the coupling does not need to be unidirectional. It is straight forward 
to introduce bidirectional coupling by changing (15) to 

qi = -FACPo(qi) + Pc(q2) (3i) 



and working out the math as outlined above. Equivalently to (28), a second 
rotation matrix P 2 is introduced. Therefore, a third additive term in (29) arises. 



4.5 The ACPO CPG 

The three most common gaits observed in quadrupeds are walk, trot and bound. 
To ease the notation, the legs of the quadruped are numbered in the following 
way: left front 1, left hind 2, right hind 3, right front 4 (cf. Fig. 1(c)). If we 
define 9ddj = 0% ~ Oj as the difference between the phase of limb i and j then, 
the gaits can be classified according to Table 1(a) (the phases are normalized: 
9=1 corresponds to the full circle). 

A quadruped CPG is constructed from four fully connected ACPO, i.e. all 
oscillators are coupled bidirectionally to every other one (see Fig. 1(c)). The 
coupling matrix is of the form 




and A = 2 for all connections. A ring structure basically is enough to build the 
CPG, cf. [16, 1]. However, the additional, redundant connections increase the 
speed of the gait transitions. 

Let us outline how we can design specific gait patterns into this network. First 
of all, for this gait pattern the phase difference between the pairs of oscillators 
that are connected need to be known. We can derive these phase differences by 
help of Table 1(a). Then, for each connection a corresponding rotation matrix 
can be derived. If we take as an example the walk pattern we see that we come up 
with four different rotation matrices {9d = ±0.25±0.5) for the 12 connections. In 
order to be able to change from one gait pattern to another we make the rotation 
matrices dependent on a parameter Pgait and we exploit the fact that R(0d) = 
R~^(— 0d). By analysis of the requirements needed to generate walk, trot and 
bound we come up with three parameter sets of 9r (cf. Table 1(b)). Instead 
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Table 1. (a) The table shows the phase differences corresponding to the three most 
common gaits observed in quadrupeds, (b) The table shows the 3 different rotation 
angles that are needed in the construction of the ACPO-CPG. (c) Connection scheme 
used for the ACPO-CPG. 
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of fixing the 0d we can define continuous functions that provide these values 
when a parameter Pgait is increased. We chose the functions given in 33-35. This 
allows to chose the gait pattern by the single continuous valued parameter Pgait- 
The three corresponding rotation matrices are used in the connection scheme as 
presented in Table 1(c) (using the short notation Rj = R(0i^_i)). 




1 -|- g 20(Pgait 2 ) 

(Pgait — 1)^ + 

1 

1 _|_ g~20(Pgait4-§) 



-1 1 



1 -|- g 20(Pgait 2 ) 




(33) 

(34) 

(35) 



Fig. 2. 1 , 2,3 as a function of the chosen gait parameter. Pgait = 0 corresponds to the 

walk pattern, Pgait = 1 to trot, and Pgait = 2 to the bound. Solid line: 0 _ r , i , dashed line: 
0n,2, dash-dotted line: dn, 3 - The dots correspond to values that correspond exactly to 
the values for the different gait patterns. However, also for settings quite far from these 
points the gait patterns are stable. 



4.6 Simulation Results of the ACPO-CPG 

In the following, the results of numerical integration of the ACPO-CPG are pre- 
sented. The system was integrated with a variable step Runge-Kutta solver [11]. 
The tolerance settings were Trei = 10“^ and Tabs = 10“®. The initial conditions 
were always chosen randomly in 0i,2,3,4 G [—1, 1]. Because the system is robust 
against random initial conditions, we do not present the transient behavior at 
the beginning of the integration procedure but rather focus on the more inter- 
esting phenomena during gait transitions. In Fig. 3, all possible transitions are 
shown. The time t = 0 always corresponds to the time when the gait control 
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parameter Pgait is changed abruptly from one setting to another. Noteworthy 
here is that not all transitions are made with the same ease. Especially the tran- 
sitions from walk to bound and back take up to about 1.5 s to begin. Also the 
transient time is higher for these transitions. Furthermore, we have an asymme- 
try in transitions. The transitions from walk to bound is faster then from bound 
to walk. Interestingly, Kelso et al. [9] have shown the same effects when human 
subjects are asked to consciously switch from one coordination task to another. 
The authors also establish the link to the physical theories of complex systems 
that will be addressed in the discussion. 

Random fluctuations play a very important role in synergetic systems. It 
turns out that they are fundamental to any pattern formation process. Further- 
more, we want our model to be robust against noise. Therefore, we use a noisy 
model to test the influences of noise. For that the differential equation of the 
system gets transformed into a stochastic difference equation 

Z\q=(l + ai"(q)^i (36) 

where ^ is a uniformly distributed random number in [-0.1,0.!]. The stochastic 
difference equation was then integrated using the Euler method with a time-step 
of At = lO^'^s. Representatively, for the illustration of the effect of the noise, 
the transition from bound to walk has been chosen, because from the above 
presented results it is known to be slowest. In Fig. 4(a) the results are presented 
and as can be seen the begin of the transition occurs about one second earlier, 
while the steady states are basically not affected by that noise level. Thus, our 
system is not only robust against noise, but even benefits from it. Such noise 
induced improvements has been shown in a variety of systems [4] and are now 
commonly called stochastic resonance. 

Finally, in order to illustrate one significant advantage of dynamical systems 
based CFG models for controlling walking over other methods (e.g. trajectory 
replay), we present the behavior of the model in case of an external disturbance 
in Figs. 4(b) and 4(c). 

5 Discussion 

The ACPO CPG. We have presented a model for a quadruped central pattern 
generator. The model is of distributed nature and shows fast transitions and 
only one global attractor. It is robust against noise and perturbations. By one 
continuous variable we have the control over the chosen gait patterns. From the 
algorithmic point of view the model is very simple. Considering all these proper- 
ties, we conclude that the ACPO-CPG is a viable candidate for the implemen- 
tation in a robot. The presented CPG is however only applicable to interlimb 
coordination. Additional oscillators are needed for intralimb coordination (i.e. 
coordinating different DOFs at the hip, knee, and ankle). However, the presented 
methodology is applicable for these problems as well. 

The choice of the subsystem in form of simple oscillators [1, 19,3], and more 
specifically phase oscillators [21,20] has been presented before. However, we 
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a) 



b) 




time [s] 



time [s] 



Fig. 3. Results of the numerical integration of the ACPO CPG. a) Trajectories of the 
ACPO-CPG when switching from walk to trot to bound and the corresponding phase 
difference plots {Od,ij)- Dashed line: Od, 12 , solid line: 9d,i3, dash-dotted line: Od,n- The 
upper figure presents the oscillatory activity (xi), while the lower figure shows the 
corresponding phase difference evolution, b) phase difference plots for walk to trot 
(upper figure) and trot to walk (lower figure). The dashed vertical line indicates the 
time at which Pgait is changed, c) walk to bound and bound to walk d) trot to bound 
and bound to trot. 



motivate our choice with concepts from physics of complex systems rather then 
base the model on simplified cell models. That this abstraction implied by the 
choice of simple oscillators makes sense and is based on firm theoretical grounds 
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(a) (b) (c) 

Fig. 4. a) Further experiments on the influence of perturbations on the transition 
speed. Representatively the bound to walk transitions is chosen which is the slowest. 
Noise is added during the integration procedure (see text). As can be observed the 
transition is initiated about 1 s earlier then in the case without noise, b), c) To illustrate 
the robustness against perturbation that is inherently built in the structurally stable 
dynamical system model of the CPG we present the case when the state variable for the 
left hind leg gets fixed for 0.2 s and then released again during walk. The two vertical 
lines show the time when the legs is fixed and released again. As can be observed, the 
leg increases in speed in order to catch up with the other legs to fulfill the requirements 
of the gait pattern. Within less than 0.5 s, the normal gait is re-established. 



can be seen when looking on the observations made by biologists from a complex 
systems perspective. 

Modeling in the complex systems framework. We argue that these observa- 
tions (i.e. low dimensional dynamics and autonomous oscillatory behavior of 
nerve centers) and the resulting abstract concepts (i.e. CPG) are not a coinci- 
dence, but rather a necessity. The reason for that necessity can be understood 
by physical theories of complex systems developed over the last few decades. 
These theories deal with systems that are constructed from active subsystems. 
Understanding the concepts covered by these theories and gaining the insight 
that modeling controllers for walking robots is an example of a much broader 
class of problems, we can turn the physical theories into a design methodol- 
ogy that allows us to decide which features need to be preserved in our model 
and which one can be abstracted away in order to arrive with a controller that 
satisfies given global properties. 

Haken [7] puts the argument forward that a large ensemble of interacting sys- 
tems normally exhibits low-dimensional dynamics under very broad conditions. 
While others have formulated parts of the ideas before it was his contribution 
to formulate an integrated theory of such systems, which he called synergetic 
systems. He enhances the concept of the order-parameter introduced by Lan- 
dau [10]. The order parameters are identified as slowly evolving variables in a 
dynamical system (e.g. in the laser, a prime example of self-organizing systems, 
the order-parameter is the field strength of the laser light). The order-parameters 
turn out to be the instable modes of that system and their number remains usu- 
ally a very few comparing to the full state space of the system. The key point 
is that all the other variables of the system follow the order parameters, and, 
on the other hand, the activity of the full system influences the evolution of the 
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order parameters. Haken formulated that fact in the slaving principle. One can 
build hierarchies of systems where the order parameters of one subsystem con- 
stitute the subsystems for the next hierarchy level. In the case of the locomotory 
system the order parameters of interest are the phase relationships between the 
limbs. The scales of the order parameters and the subsystems differ in about 
three orders of magnitude (neurons: ~ 10“^ s - limb activity ~ 10° s). The dif- 
ferent scales are typical for synergetic systems. Furthermore, it has been shown 
theoretically [7] and experimentally [14], that the behavior of the order param- 
eters is very independent of the exact nature of the subsystems. Even more, at 
the order parameter level, completely new phenomena can occur, which are not 
foreseeable at the subsystem level^. 

Conclusion of the complex systems perspective. Considering the aforemen- 
tioned facts, it gets clear that there are two approaches of modeling the behavior 
of such systems. Both of them have strengths and weaknesses. The first method 
is to derive models for the subsystems and couple them to come up with the 
complete model. This is an important approach, especially if one is interested 
in the exact behavior of the real system being modeled and the influence of all 
the parameters (for an example see [13]). However, especially when the chosen 
level of description is very detailed, this method is rather tedious, it leads to 
complicated models that are normally computationally intensive and possess a 
large number of parameters. One has to have an enormous knowledge of the 
details of the subsystems which in reality is often missing. Especially, if one is 
successful with this modeling approach, one will rediscover the aforementioned 
system hierarchies. The other approach is to focus on the order-parameter level, 
if one is mainly interested in mimicking the overall system behavior. It is an phe- 
nomenological approach. The advantage here, is that one is freed from a huge 
amount of parameters, the systems are usually simple and easy to simulate. Yet, 
the physics guarantees that we still catch the important aspect of the system 
behavior, namely the behavior of the order-parameters (i.e. models for human 
inter-limb coordination see [8,9]). The model derived by this approach typically 
consists of one low-dimensional dynamical system describing the behavior of the 
order parameters (e.g. quadruped CPG see [16]). In this article we are following 
the second approach. 

As we are free to chose which level of the system hierarchy we would like 
to model in order to arrive with an usable model for a robotic application, a 
good approach is to keep a distributed model consisting of a few subsystems. 
The subsystems themselves are still models of complex systems. Therefore, they 
model order-parameter behavior. Naturally, one splits up the whole system into 
subsystems, where the system being modeled also shows some modularization 
(i.e. Body segments. Limbs, ...) or where we identify parts that lend themselves 
to easy measurement of the subsystem behavior. In case of the walking con- 
troller, the order parameters are the population activity of the motoneurons for 
one limb. The population activity serves to drive the muscles. The subsystems 
are the single neurons of the limb CPG, the muscle cells and all the other nu- 

^ aka. emergence, network effects, self-organization 




Distributed Central Pattern Generator Model 



347 



merous parts that form the neuro-mechanical system. Because we are at the 
order-parameter level of description, it gets clear that there is no need to use 
models that are motivated by observations made on the single neuron in order to 
model the behavior on the CPG level. Another motivation for the choice of the 
canonical subsystem in form of a simple phase oscillator is the fact that from an 
mathematical viewpoint all limit cycle systems belong to the same universality 
class [2]. I.e. effects observed in one limit cycle system are also to be expected in 
another limit cycle system (However, in practical cases, the relationship is often 
enough only accessible in a qualitative manner. Even, if, from the mathematical 
viewpoint, a quantitative relationship exists). Furthermore, with this method we 
arrive with a model that does not show certain drawbacks of earlier models such 
as dependence on initial conditions, slow and lacking transitions, or periodic 
driving and prove therefore that our modeling approach is viable. Most proba- 
bly the most fundamental advantage for our goal of controlling robots is that 
by choosing the simple oscillator model, we can predict the phase relationships 
with more ease and to a certain extent by analytical methods. 

The level of abstraction of the ACPO-CPG corresponds to the order param- 
eter description of dynamical systems. At this level of description a very simple 
model can be derived as shown by [16]. The model presented by Schdner et al. 
however is not of distributed nature anymore. As mentioned before, one inter- 
esting property of synergetic systems is their distributed nature. In a robot one 
would like to have simple distributed control for low level tasks such has locomo- 
tion, thus allowing a central processor to use its power to address more involved 
tasks, such as path planning, communication and the like. Therefore, in this 
contribution we constructed a model with a more complex structure, that lends 
itself for a distributed implementation in a robot built of uniform elements. 

Outlook and future work. From a more theoretical point of view it will be 
interesting to do more rigorous analysis of the model, e.g. bifurcation analysis. 
Furthermore, it will be interesting to take a closer look at the improvement by 
noise, and compare the observation to other examples and theoretical considera- 
tions about stochastic resonance. It is known that there exists a certain optimal 
level of noise for a given system. This optimum remains to be found. 

Since the characteristics of coupled dynamical systems, that we exploited to 
construct the AGPO-GPG, are universal characteristics that can be observed in 
many real world systems such as semiconductors [14], analog electronics [12], 
chemical reactions [17] and many more, one is basically able to implement this 
models on top of a variety of substrates. The choice in nature are neurons, but for 
applications we are not restricted to this substrate. The substrate of choice for 
implementation in the long term will be the one where we have the appropriate 
control over the characteristic time and length scales on one hand, and suitable 
operation conditions (temperature, field strengths, power consumption) on the 
other hand. In addition to that, it should be cheap and simple to manufacture. 
Therefore, to find such suitable substrates and the way of implementing the 
systems on top of them, a lot of experiments have to be done. 
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Conclusions. In recent years a lot of progress has been made in understand- 
ing complex systems from a theoretic point of view. Moreover, advances in tech- 
nology allows us to implement and partially simulate systems of a complexity 
hitherto impossible. Yet, for applications, these powerful concepts are not yet 
exploited in a systematic fashion. Researchers in different fields often make im- 
plicit use of the concepts contained in the theory of complex systems when they 
make investigations and observations, yet, sometimes make assumptions that are 
not well aligned with this theory. In the authors opinion, it is important and one 
of the grand challenges for the next decades to transform the knowledge into 
design principles and collect experiences in order to harness the full power of ac- 
tive distributed systems. The research presented here, belongs to a more general 
effort that aims at using theories of coupled dynamical systems in the solution 
of difficult engineering problems and tries to devise new design principles. The 
possible fields of application are numerous - network engineering, multichannel 
information transmission, sensor networks and robotics just to name a few. 
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Abstract. In construction of the agent-based distributed system, agent 
traversal is one of the most fundamental building blocks. It is the oper- 
ation to make each agent visit each node exactly once and return to its 
originating node. In this paper, we consider the ant-based approach to the 
agent traversal problem. We present three distributed implementation of 
the ant system and evaluate their performance by simulation. Moreover, 
we propose a novel lightweight implementation of the ant system where 
the unnecessary traffic of network is reduced. The performance of the 
lightweight implementation is also evaluated by simulation. We show 
from the simulation results that this implementation achieves drastic 
reduction of network traffic. 



1 Introduction 

Recently, mobile agent attracts much attention in the area of distributed com- 
puting [5] . A mobile agent is a program that can migrate from node to node in 
a distributed system. It can suspend its execution at an arbitrary point to leave 
a node, and can resume the execution on arrival at a new node. There are many 
efforts of developing distributed systems based on mobile agents. In construction 
of the agent-based distributed system, agent traversal is one of the most funda- 
mental building blocks [6]. It is the operation to make each agent visit each 
node exactly once and return to its originating node. The agent traversal can 
be utilized to implement several applications such as data collection, searching 
and filtering, network management, and so on. Thus the agent traversal with 
the minimum cost is highly desired. 

The problem to find the agent traversal with the minimum cost is equivalent 
to the well-known traveling salesperson problem (TSP). Since the TSP is a NP- 
hard problem, several heuristic approaches have been investigated. 

The ant eolony optimization{ACO) is one of the most successful approaches 
to the TSP [1][2][3][4]. The ant system, that is an instance of AGO, is a dis- 
tributed search paradigm using multiple agents (called ants) for combinatorial 
optimization problems, and is inspired by behavior of ants. The ant system finds 
a good solution by iterative processes. In each iteration, each agent constructs 
a candidate solution. Each construction is probabilistically guided by heuristic 
information of the problem instance and experience of the agents. The experi- 
ence information of an agent is indirectly conveyed to other (or future) agents 
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through the pheromone, which represents how attractive a solution is. Once an 
agent finds a good solution, it deploys the pheromone to have the other agents 
search the neighborhood of the solution. For the case of the TSP, each agent 
probabilistically moves from node to node to construct a tour. When all agents 
complete the construction, the pheromone level of every link is adjusted so that 
the link included in a tour with a smaller cost has a higher pheromone level. 

Despite the autonomy each agent has, the ant system is introduced as a 
centralized system. To solve the agent traversal problem in a distributed system, 
distributed implementations of the ant system must be developed. 

In this paper, we investigate decentralized implementation of the ant system. 
A straightforward implementation is to execute each iteration of the ant system 
in a synchronous fashion. However, the inherent asynchrony of a distributed sys- 
tem causes waiting time of agents at the end of each iteration. To avoid the 
waiting time, we propose two asynchronous implementations, and show from 
simulation results that these implementations successfully introduce the asyn- 
chrony without sacrificing the convergence of solutions. 

Another remarkable contribution of this paper is to propose the lightweight 
implementation where the number of searches is reduced without degrading qual- 
ity of the solution. In the original ant system, if the cost of the obtained solution 
comes near the optimal, only a few searches can bring on improvement of the 
solution. That is, if each agent continues the search, almost all searches come not 
to contribute the improvement of the solution in the future. In the centralized 
system, those useless searches do not seriously affect the performance of the sys- 
tem. On the other hand, in the distributed system, since one search corresponds 
to a tour of an agent, those useless searches cause seriously heavy traffic in the 
network. Thus, the useless searches should be avoided. 

The simplest strategy to avoid the unnecessarily heavy traffic is to lower the 
frequency of the search of each agent gradually. However, this strategy with the 
reduced number of searches has another critical problem: In real network, the 
cost of every link dynamically changes. If several links of the obtained best tour 
increase their own costs, the tour should be updated. Then, if the frequency of the 
search is low, it takes long time to find the updated good tour. Thus, the system 
cannot adapt to the dynamic changes of networks. We present modification of 
the strategy so that it can adapt to the dynamic change of networks. We also 
present simulation results to show its adaptability to the dynamic change of 
networks with keeping the reduced number of searches. 

This paper is organized as follows. First, we define the distributed system 
and introduce the traditional ant system in Section 2. We present the two im- 
plementations of the ant system in a distributed system in Section 3, and present 
the lightweight search scheme in Section 4. Both of these sections contain the 
simulation results. Finally we conclude this paper and state the future research 
issues in Section 5. 
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2 Preliminaries 

We consider a distributed system consisting of n processes po,pi, . . . ,Pn-i- The 
processes can communicate with each other by exchanging messages. We assume 
that any pair of processes can communicate directly, that is, we assume the 
distributed system is completely connected. 

Recently, distributed systems based on mobile agents attract much attention. 
A mobile agent is a program that can migrate from node to node in a distributed 
system. It can suspend its execution at an arbitrary point to leave a node, and 
can resume the execution on arrival at a new node. In the agent-based distributed 
systems, agent traversal is one of the most fundamental operation. It requires 
each mobile agent to make a tour where it visits each node exactly once and 
returns to its originating node. 

We assume that the cost for an agent to migrate from process pi to process pj 
is given by Cij in the distributed system. We also assume the costs are symmetric, 
that is, Cij = Cj^i holds for any distinct processes Pi and pj. When given the 
migration cost between every pair of processes, the agent traversal with the 
minimum cost becomes important. In this paper, we consider the agent traversal 
problem in a distributed system to find the minimum cost tour. 

The agent traversal problem is obviously equivalent to the traveling salesper- 
son problem (TSP). Since the TSP is a well-known NP-hard problem, several 
heuristic approaches have been investigated. The ant system is one of the ap- 
proaches to the TSP. In the ant system with m agents (called ants), each agent 
makes a tour, visiting each process exactly once and returning to its originating 
process. When an agent visits each process, it determines the next process to visit 
with a probability based on the pheromone level and the cost between processes. 
When all agents complete their tours, the pheromone level is adjusted so that 
a link belonging to a tour with smaller traveling cost has a higher pheromone 
level, and that a link chosen by more agents has a higher pheromone level. 

The framework of the agent system is given in Figurel . 



Initialization 

repeat until a termination condition holds 
for each agent 

make a tonr probabilistically 
for each link (pi,Pj) 

update the pheromone level of (pi,Pj) 
return the minimum cost tour obtained 



Fig. 1. The framework of the agent system 



Let Tij(t) be the pheromone level of link {pi,pj) at the end 
iteration. In the initialization, the initial pheromone level Tij(O) is 



of the t-th 
determined 
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so that all links have the same pheromone level, and the initial locations of m 
agents are determined arbitrarily (maybe probabilistically). 

In each iteration, each agent au {k = 0, 1, 2, . . . , m — 1) makes a tour by 
repeatedly choosing the next process to visit at each visiting process. The next 
process is chosen probabilistically: when agent ak is visiting process pi, the prob- 
ability of migration to process pj in the t-th iteration is defined as follows. 












^^j:pjis not visited by afc (^)“ • %j) 



The quantity rjij is called visibility and is defined as rjij = Parameters 

a and f3 allow a user control on the pheromone level versus the visibility. The 
probability implies that a link is chosen with higher probability when it has a 
higher pheromone level and/or a smaller cost. 

Now we explain the update of the pheromone level in each iteration. Let 
AT^j{t) denote the amount of the pheromone that agent ak deploys at link 
(pi,Pj) in the t-th iteration. is defined as follows: 

At^ (t) = I (Pi’Pj) S 

\ 0 otherwise 

where Q is a constant, T^(t) denotes the tour Uk makes in the t-iteration and 
C^{t) denotes the cost of T^{t). On the other hand, the pheromone of each link 
is reduced at some ratio, thus, the pheromone level Tij{t) of (pi,pj) at the end 
of the t-th iteration is determined from Tij{t — 1) and Ark -it) as follows: 



m— 1 

= P ■ Tij{t -i)+Y^ W 

k=0 

where p is a constant called an evaporation coefficient. Therefore, the pheromone 
level is adjusted so that a link is chosen with higher probability when it is 
included in tours with smaller costs and it is chosen by more agents. 



3 Distributed Implementations of the Ant System 

Despite the autonomy each agent has, the ant system is introduced as a cen- 
tralized system. To solve the agent traversal problem in a distributed system, 
distributed implementations of the ant system must be developed. In this sec- 
tion, we present three implementations of the ant system in a distributed system 
and show simulation results of the three implementations. 

3.1 Three Implementations of the Ant System 

In all of the three implementations presented in this section, each process stores 
the visibility and the pheromone level of its adjacent links. The pheromone level 




354 



T. Izumi and T. Masuzawa 



repeat until a termination condition holds 
make a tour probabilistically 

make the same tour with updating the pheromone level of each link 
(during the tour, reduce nj by (1 — p)Tij{t — 1) 
if the agent is the first visitor of pi) 

wait for all other agents to complete the current iteration 



Fig. 2. Action of each agent in the synchronous implementation 



of every link is adjusted according to the tours agents make. This adjustment 
is also executed by the agents: when an agent visits a process, it updates the 
pheromone level according to the latest tour it made. Thus, the t-th iteration of 
the ant system basically executes two phases as follows. 

1. the first phase: each agent ak makes a tour T^(t) by choosing the next process 
with the probability Pij{t) as described in the previous section. 

2. the second phase: each agent increases the pheromone level of link (pi,Pj) 

by as described in the previous section. 

In what follows, we present three distributed implementations. The differ- 
ence of the three implementations is difference of synchronization they require 
among the agents. The first implementation requires the strongest synchroniza- 
tion and the second and the third implementation require no synchronization. 
Since distributed systems have inherent asynchrony, weaker synchronization is 
more desirable from a view point of implementation. However, since the original 
ant system is centralized one, the implementation with stronger synchronization 
can simulate the original ant system exactly. Thus, the asynchrony introduced 
in the implementations may affect the convergence ratio of the ant system. The 
affect is estimated by simulation in the next subsection. 

Synchronous implementation. The first implementation is the straightforward 
one. In this implementation, all agents synchronously execute each iteration, 
that is, each agent can start execution of the next iteration only when all agents 
complete the current iteration. 

In the second phase of the t-th iteration, each agent makes the same tour it 
made in the first phase and increases the pheromone level Tij of each link (pi,pj) 
by AT^j{t). At the same time, the pheromone level is reduced by {l — p)Tij{t—l). 
This reduction is executed by the agent that first visits each process pi. 

Fig. 2 shows the action of each agent in the synchronous implementation. 

It is obvious that the synchronous implementation simulates the original ant 
system exactly. However, it has the disadvantage that each agent has to wait for 
all other agents to complete the current iteration. This synchronization overhead 
cannot be ignored in distributed systems. 
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Asynchronous implementation. The second implementation is an asynchronous 
version of the first implementation. In this implementation, each agent makes 
two tours in each iteration as in the first implementation, but the agents need 
not wait for other agents to complete the current iteration. 

Fig. 3 shows the action of each agent in the asynchronous implementation. 



repeat until a termination condition holds 
make a tour probabilistically 

make the same tour with updating the pheromone level of each link 
(during the tour, reduce Tij to p' ■ Tij first ,and then increase Tij by Ari jit)) 



Fig. 3. Action of each agent in the asynchronous implementation 



Since distributed systems have inherent asynchrony, agents may execute dif- 
ferent iterations at the same time. Thus, when a process pi is visited first by an 
agent in its f-th iteration, some other agent may be executing the {t — l)-st or 
earlier iteration. Thus, the agent cannot reduce the pheromone of link (pi,Pj) 
by (1 — p)Tij{t — 1) because Tij{t — 1) is not known yet. In the second imple- 
mentation, the evaporation of the pheromone level of each link is executed by 
each agent using the other evaporation coefficient p' instead of p, where p' is an 
appropriate constant (e.g. p' = 

In the asynchronous implementation, no agent waits for other agents to com- 
plete the iteration, thus, the implementation is expected to have higher per- 
formance than the synchronous implementation. However, in the asynchronous 
implementation, the pheromone level of a link may take some value that is never 
taken in any execution of the original ant system. Thus, the asynchrony may 
introduce longer convergence to a good solution. The affect is estimated by sim- 
ulation in the next subsection. 



Asynchronous implementation with lazy pheromone update. The third implemen- 
tation is a modified version of the second implementation. From the simulation 
results, we can conclude that the asynchrony does not cause longer convergence 
to a good solution. This brings a thought of lazy update. In this implementation, 
each agent makes a single tour in each iteration. In the t-th iteration, each agent 
probabilistically determines the t-th tour as in the second implementation and 
makes the update of the pheromone level by increasing of each link (pi,Pj) 
by Ar^jit — 1). To make the update, each agent carries the information about 
the {t — l)-st tour during execution of its t-th tour. Notice that the pheromone 
update is executed in the (t — l)-st iteration in the first and the second imple- 
mentations, but is executed in the t-th iteration in the third implementation. 

Fig. 4 shows the action of each agent in the asynchronous implementation 
with the lazy pheromone update. 
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repeat until a termination condition holds 

make a tour probabilistically with updating the pheromone level of 
each link according to the tour it makes in the previous iteration 
(during the tour, reduce Tij to p' ■ Tij hrst ,and then increase Tij by 



Fig. 4. Action of each agent in the asynchronous implementation with lazy update 



In the third implementation, each agent finds a tour once in a traversal, 
whereas in the second implementation, each agent finds a tour once in two con- 
secutive traversal. Thus, compared to the second implementation, the number of 
tours each agent is nearly doubled in the third implementation. This implies that 
the third implementation is expected to have faster convergence speed than the 
second implementation. The effect of the lazy update is estimated by simulation 
in the next subsection. 

3.2 Simulation Results 

In this subsection, we compare the three implementations proposed in the pre- 
vious subsection by simulation. In the simulation, we simplify a migration of 
an agent for ease of the simulation: each agent takes one step to migrate from 
node to node, independent of the cost of the link. It may seem natural that one 
migration takes some steps dependent on the cost of the link. However, from 
the preliminary simulation, we confirm that such exactitude does not affect the 
simulation results at all, except for the slow down of the simulation. Thus, we 
adopt the simplified assumption. 

The execution time of the original ant system is measured by the number of 
iterations. However, since we want to evaluate their performance in distributed 
systems, the number of iterations is not good for our objective. Instead of the 
number of iterations, thus, we use the number of rounds as the measure of the 
execution time. The round is commonly used to measure the execution time of 
distributed algorithms, and each round is defined to be the minimum time period 
such that every agent executes at least one step during the period. Notice that 
an agent may execute two or more steps because of asynchrony. Informally, an 
agent executing more steps in one round is considered to work faster. To simulate 
the asynchronous execution of the distributed ant system, the agent that will 
execute the next step is randomly selected. 

We execute the simulation with the following settings: The problem instance 
is randomly generated, that is, each link has the randomly selected cost. The 
ant-system parameters are set to n = m = 50, a = 1, /? = 6, Q = 100, and 
p = 0.5. 

The simulation results are presented in Figure 5. The figure shows how the 
cost of the best obtained tour is improved as the simulation round advances. 

The simulation results show that there is no considerable difference among 
the three implementations. However, it is worthwhile to make a notice about 
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Fig. 5. Comparison of three distributed implementation 



the simulation of the synchronous implementation. In the simulation of the syn- 
chronous implementation, every agent executes exactly a single step in each 
round. This execution can be considered as the ideal synchronous one and can 
exactly simulate execution of the centralized ant system. However, the simula- 
tion results for the synchronous implementation does not contain the overhead 
for the synchronization, i.e., the overhead for all agents to be synchronized at 
the end of every iteration. Thus, from the simulation results, we can conclude 
that both of the asynchronous implementations outperform the synchronous one 
in actual distributed systems because the synchronous implementation requires 
additional overhead for synchronization. 

On the other hand, the two asynchronous implementations does not have 
significant difference in performance while we expect improvement due to the 
lazy update. We consider that the lazy update can achieve distinct improvement 
for larger scale of networks. 



4 Reduction of Useless Searches 

In the original ant system, as the cost of the obtained solution comes near the 
optimal, only a few searches can bring on the improvement of the solution. To 
avoid unnecessary traffic in the network, the number of such useless searches 
should be reduced without sacrificing adaptability to the dynamic change of the 
network. In this section, we propose Sleep-Awake Search, as a novel lightweight 
search scheme, and evaluate its efficiency by the simulation. 
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4.1 Sleep-Awake Search 

In this subsection, we introduce the sleep-awake search. The sleep-awake search 
introduces two additional action of agents, Sleep and Awake to the agent system. 
In the sleep-awake search, the agent that continuously fails to improve the ob- 
tained solution ceases to operation for some period (Sleep), and the agent that 
detects extreme change of the network gets back to the operation (Awake). In 
the rest of this subsection, we show the details of the two actions. 

(a) Sleep : Each agent memorizes the best tour that it has ever found and 
tries to improve the tour by searches. If an agent continuously fails to improve 
its best tour, it goes to sleep for some period. More precisely, each agent works 
as follows: Each agent ak maintains the variable Cfc that counts the times that 
the agent continuously fails to improve the best tour. When the agent gets up, 
it makes a tour probabilistically. If it succeeds to improve the tour, then it sets 
Cfc = 0. If it fails, then it increments by one. According to the resultant value 
of Cfc, the sleeping time As is probabilistically decided as follows: 

r 0 if Cfc < T 

As{ck) = < n(cfc - T)R if T < Cfc < Tmax 
[ n{Tmax - T)R if ^max Si C 

where T and T^ax are constant parameters and represent the threshold ,and R 
is a random value in the range [0, 1]. Notice that an agent does not completely 
stop finding tour, even if it is in the sleep mode. In sleep-awake search, sleeping 
agent just only lowers its own frequency of search. 

(b) Awake : In every search (i.e., during making a tour probabilistically), 
each agent Ofc checks the cost of each link in the best tour Tfc it has obtained. 
After a tour, Ofc knows the (nearly) current cost of the tour Tfc. If the cost is 
sufficiently higher than the memorized cost of Tfc, the agent ak gets back to the 
search operation by setting Cfc = 0 ever when it fails to improve the best obtained 
tour. Formally, letting Ce be the current cost and Cm be the memorized cost, if 
Ce > (1-1- a)Cm or Ce < (1 — a)Cm holds for some constant parameter a > 0, 
then Cfc is reset to 0. Moreover, in the Ofc’s next search, ak wakes up aj by setting 
Cj = 0, if ak meets another sleeping agent aj. 

Intuitively, while the system is stable, that is, while the system only suffers 
less change of link cost, the optimal cost also become stable. Then, there is only 
few chance for each agent to improve its own best tour. Thus, each agent is 
expected to have higher tendency to sleep. In contrast, while the system is un- 
stable, there is many chances of improving the cost. Thus, each agent is expected 
to find tours actively. 

Fig. 6 shows the action of each agent in the sleep-awake search. In the fol- 
lowing subsection, we call the original search scheme continuous search for ease 
of explanation. 

4.2 Simulation Results 

In this subsection, we evaluate the efficiency of the sleep-awake search scheme 
by the simulation. The main objectives of this simulation is to compare the 
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repeat until a termination condition holds 

make a tour Tk probabilistically with updating the pheromone level of 
each link according to the tour it makes in the previous iteration 
(during the tour, reduce Tij to p' ■ Tij first ,and then increase Tij by 
(If mode = awake holds and meet another sleeping agent, wake up it.) 
if the cost of Ck > {1 + a)Cm or Ct, < (1 — a)Cm then 
0 

mode t— awake 
else 

c t— c -I- 1 

mode <— notawake 

wait As{c) time units (sleep) 



Fig. 6. Action of each agent in sleep-awake search 



sleep-awake search with the continuous search in the points of (1) the number 
of searches, and (2) the adaptability to dynamic change of networks. 

To evaluate these two points, we need prepare the instance where the costs 
of links are dynamically changes. We make simulation of the two schemes in the 
following three settings: 

1. Since the best obtained tour is expected to be nearly optimal, only the cost 
change of the links in the optimal tour affects extremely the best obtained 
tour. Thus we find the nearly optimal tour Topt by the 2-opt heuristics in 
advance and determine the link set Top* appearing in Topt to be the target of 
the cost change. In the simulation, when the current round becomes Rt/3 
where Rt is the total simulation rounds, the costs of the links in Lopt begin 
to increase. For the link (pi,Pj) in Lopt, the cost of (i,j) is increased by 
O.OScij every 20 rounds. This increase is repeated 50 times. Next, when 
the current round becomes 2 • Rt/3, the cost of the links in Lopt begin to 
decrease. For the link (i,j) in Lopt, the cost of (i,j) is decreased by O.Obcij 
every 20 rounds. This decrease is repeated 50 times. 

2. In addition to the change of the first pattern, we introduce the random 
change of the costs of links. At every 100 rounds, each links {pi,pj) increase 
or decrease its own cost in the range of ±10% for the current cost if the 
current cost is in the range [0.5cij, 1.5cij]. 

3. This pattern involves the random changes in the second pattern. This pattern 
uses the target links Lopt that is defined in the pattern 1. In this pattern, 
the costs of the target links are alternately increased and decreased 5 times. 
Each change starts at 2i?T/7, 3i?T/7, 4i?T/7, 5RtI7,6RtI7. In each change, 
the costs of the links in Lopt begin to increase (or decrease). For the link 
{Pi,Pj) in Lopt, the cost of {i,j) is increased (or decreased) by g ■ Cij every 
20 rounds, where g is selected randomly in the range [0.03cjj, O.OOcij]. For 
one change, this increase/decrease is repeated r times where r is the random 
value in the range [30,70]. 
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(])Pattem 1 




(2)Pattem 2 




(3)Pattem 3 

Fig. 7. Adaptiveness to dynamic change of link costs 



In each pattern, the parameters of the sleep-awake search scheme are set 
differently. In the pattern 1, the parameters are set to: a = 0.1, T = 20, = 

40. In the pattern 2 and 3, the parameters are set to: a = 0.15, T = 15, T = 
40. The following setting is common in all experiments of this subsection: the 
numbers of processes and agents are respectively 50. Simulation is executed for 
a period of 20000 rounds. The ant-system parameters and the initialization are 
the same as the subsection 3.2. 

Figures 7, 8, and 9 present the simulation results. Each graph plots the 
average value of 10 trials. 

Figure 7 presents the adaptability of the two search schemes to dynamic 
change of the link costs. To clarify the adaptability, we also plots the nearly 
optimal solution that is obtained by the 2-opt heuristics for the snapshot of each 
round. The simulation results show that the sleep-awake search scheme possesses 
the almost same adaptability as the continuous search scheme. In contrast, the 
sleep-awake search scheme has the peak value that is a little higher than the 
value that the continuous search has. We can see the more details in Figure 8, 
which presents the ratio of the solution quality of the sleep-awake search scheme 
for the continuous search scheme. From the figure, we can see that the ratio is 
bounded by about 1.1 even in the peak, and that the ratio is almost 1.0 in most 
situations. In fact, the average ratio is almost 1.0 (Table 1). 
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(l)Patteml (2)Pattem2 




(3)Pattem 3 

Fig. 8. The ratio of the solution quality of the sleep-awake search for the continuous 
search 



Figure 9 shows the number of searches executed in the two schemes. From 
this figure, we can see that the sleep-awake search scheme attains a drastic 
reduction in the number of searches in every pattern. Especially, when the link 
costs are stable, the sleep-awake search scheme marks the extreme reduction 
of the number of searches (about 35% in the pattern 2, and about 55% in the 
pattern 1). 



Table 1. Average ratio and reduction% 





avg. ratio reduction% 


Pattern 1 


1.011345 


56.1% 


Pattern 2 


1.011231 


35.6% 


Pattern 3 


1.002310 


25.1% 
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— Continuous search 

— Sleep-Awake search 



(l)Pattem 1 




(2)Pattem 2 




(3)Pattem 3 



Fig. 9. The number of searches 



5 Conclusion and Future Research Issues 



In this paper, we proposed the ant-based approach to the mobile agent traversal 
problem. First, we investigated the distributed implementation of the ant system. 
We presented three implementations, which are distinguished by difference in 
asynchrony levels, and evaluated their performance by simulation. We showed 
from the simulation results that these implementations successfully introduce 
the asynchrony without sacrificing the convergence ratio of the solutions. 

Next, we proposed the sleep-awake search scheme, as a novel lightweight im- 
plementation where the number of searches is reduced without degrading quality 
of the solution, and made comparison between the sleep-awake search scheme 
and the (original) continuous search scheme. The simulation results exhibited 
drastic reduction of the number of the searches, especially when the link costs 
are stable. 

As our future challenges, we will try to improve the presented methods with 
considering the following factors: a network with arbitrary topology, topology 
change of networks such as node insertion/deletion, and more appropriate pa- 
rameter setting. And, we will apply the method to larger problems. 
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Abstract. Designing storage area networks is an NP-hard problem. Pre- 
vious work has focused on traditional algorithmic techniques to automat- 
ically determine fabric requirements, network topology, and flow routes. 
This paper looks at the ability of an ant colony optimisation algorithm to 
evolve new architectures. For some small networks (10 hosts, 10 devices, 
and single-layered) we find that we can create networks which result in 
savings of several thousand dollars over previously established methods. 
This paper is the first publication, to our knowledge, to describe the 
successful application of this technique to storage area network design. 



1 Introduction 

As IT systems and employees become more geographically distributed and it 
becomes more and more important to access shared data. Storage Area Networks 
(SANs) will become the choice of companies looking for efficient, distributed 
storage solutions. A SAN is a set of fabric elements connecting a set of hosts 
- from which data is requested - to a set of storage devices - on which data 
is stored. The fabric elements are fabric nodes, which route data through the 
network, ports on the nodes and links physically connecting the ports. A link 
has a port at each end and a port is the terminal of at most one link. SANs allow 
for efficient use of storage related resources such as hardware and maintenance 
personnel, resulting in a storage solution that is more effective than local storage, 
in addition to being more scalable. 

Once purchased, installed, and configured appropriately, a SAN can be a cost 
effective solution to the storage problem. Recent work has focused on automating 
this process, since solutions designed by hand to support specified data fiow 
requirements tend to over provision resources by a considerable margin [14]. 
Efficiency is an important issue because the physical components of a storage 
area network can cost millions of dollars; An over-provisioned design can waste 
anywhere from thousands to millions of dollars, depending on the size of the 
network. 

A SAN problem is specified by providing a list of hosts, a list of devices, a 
list of possible types of fabric nodes, and a description of the network’s data fiow 
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requirements. Each host, device, and fabric node has a cost, a maximum number 
of ports that are available to accept links, and a maximum amount of data that 
may pass through it, called its bandwidth. The network’s data flow requirements 
are specifled by a list of flows, each of which is deflned by a source host, a 
destination device, and a bandwidth requirement. A flow may not be routed 
through a fabric element which does not have enough remaining bandwidth. 

A SAN design specifies a list of each fabric element and its connectivity 
along with a path for each flow. The aim of an automated SAN designer is to 
And the cheapest SAN that supports the specifled flows, while satisfying the port 
constraints, bandwidth constraints and non-splitting of flows. 

The problem of SAN design can be compared to that of design of other types 
of networks, as well as the problem of routing data within those networks. How- 
ever, SAN design is more difficult than other network design problems because 
in this case, there are also the additional limitations of not being able to split a 
data flow from host through to device, the limited number of ports available on 
the nodes, the limited amount of bandwidth associated with each node and port, 
and the fact that the network topology is not pre-determined. It is NP-hard to 
And the minimal cost network, and best-known algorithms on state-of-the-art 
machinery take days to complete for moderate sized problems. 

Hewlett-Packard’s Appia project [14] has shown that traditional algorithmic 
optimisation techniques can quickly specify a topology that both satisfles the 
design requirements and competes with designs created by human SAN experts. 
While able to quickly determine a possible SAN topology, the Appia algorithms 
are not guaranteed to And the optimal solution. As a result of the need to And 
a solution within minutes, the algorithms presented by the Appia group build 
usable networks following heuristic procedures that have previously shown to 
yield good networks. 

This paper seeks to explore SAN design using Ant Colony Optimisation al- 
gorithms to produce well designed SANs. We will evolve topologies which will 
result in original, buildable designs. We will show that the use of an ant based 
algorithm can result in SAN architectures which cost thousands of dollars less 
when compared with the cost of designs created by traditional heuristic meth- 
ods. This paper is, to the best of our knowledge, the first publication to describe 
the successful application of these ant inspired techniques to SAN design. 

The remainder of this paper is structured as follows. In the following section 
we will discuss previous work relating to network design and routing of data 
through a network both in terms of other types of networks and in relation to 
SANs specifically. We will then discusses the ant colony optimisation implemen- 
tation used and the results obtained in comparison to Appia designed networks, 
and will conclude with some thoughts for further work. 



2 Previous Research 

Automating SAN design is a relatively new research area and prior work specifi- 
cally relating to Storage Area Networks is limited. In this section we will discuss 
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research undertaken in determining network topology and network routing, both 
of which relate to SAN design. We will encompass work relating to both SANs 
and other types of networks which may be similar in structure and constraints. 



2.1 Previous Automated SAN Design Work 

Much work has been done by the Decision Technologies Department and the 
Storage Content and Distribution Department within HP Labs Palo Alto, in 
order to automate the process of SAN design [14]. They have concentrated on 
two different algorithms called FlowMerge and QuickBuilder, which each have 
different strengths and weaknesses in terms of finding efficient solutions to the 
SAN fabric design problem. Each will be described in brief, for more details, see 
[14]. The FlowMerge algorithm begins with a SAN connecting each host to its 
required device, given a set of flows. This configuration typically results in a large 
number of port violations (i.e. a node has more links than available ports). These 
are gradually reduced by considering individual flowsets. Each flow is initially 
considered to be in its own flowset. With each iteration of the algorithm, two 
flowsets are merged together, choosing an appropriate fabric node, and links to 
connect hosts and devices appropriately. Each iteration results in a reduction of 
the number of port violations, or, if that is not possible, a reduction in the cost 
of the design. The algorithm continues until there are no possible improvements 
on the design, or there are no other flowsets that may be merged. 

Ward, et al. [14] show that FlowMerge is one of the faster performing algo- 
rithms for the smaller 10 host, 10 device networks, especially those which have 
20 to 30 flows spread fairly evenly throughout the network. 

The QuickBuilder algorithm also begins with a SAN connecting each host 
to an associated device as given by a set of flow requirements. However in this 
case, the initial SAN configuration includes assignment to a particular port on 
each device. The configuration is then arranged into port groups, which consist 
of all connected ports. Each port group is then analysed separately in order to 
determine fabric node requirements. 

While FlowMerge tends to find solutions with many small port groups, the 
QuickBuilder algorithm tends to find SAN configurations with larger port groups 
when necessary. This algorithm tends to result in cost effective designs for large 
networks and those that are more densely populated. QuickBuilder is also faster 
than the FlowMerge algorithm for large problems (10 times as fast for the largest 
problems consisting of 50 hosts and 100 devices). 

2.2 Automated Design of Other Types of Networks 

There appears to be limited published work relating to automated design of 
Storage Area Networks specifically, aside from that referenced above. Much of 
the automated design work has been done for other types of networks, and will 
be discussed in the following sections. 

Automated network design is not a new research field. Several researchers 
have attacked this problem with traditional techniques, for networks with varying 
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constraints. However, there is no other network problem which also contains 
all of the constraints placed on network design for SANs [14]. Nodes within a 
SAN have a limited number of ports available for connections to other nodes 
(i.e. they are degree constrained) and have some cost associated with their use. 
Also, nodes, and other fabric elements in a SAN, have an associated maximum 
capacity. Additionally, a flow must not be split between separate fabric elements 
(i.e. the network is non-bifurcated) . Finally, the set of nodes in a SAN is not 
known in advance. Size, number, type of fabric elements is determined as part 
of the design problem. 



Ant Inspired Algorithms for Network Routing. Work on practical ap- 
plications of ant inspired algorithms to telecommunications networks was con- 
ducted by Schoonderwoerd, et al. in 1996 [10] and by Steward and Appleby in 
1994 [13]. In both cases the network topology was fixed, and “Ants” were used 
to determine suitable routing tables with respect to a simulated load. 

Steward and Appleby’s work uses two types of agents. The first, called a load 
agent, explores the network and determines the best routes from a particular 
node depending on the amount of traffic on the network at a particular time. 
One load agent affects the routes from only one node and once it has updated 
that node’s routing table, the agent expires. The other type of agent is a parent 
agent. The parent agents determine which nodes need new load agents (based 
on traffic patterns) . There is additionally a set of parent monitors which manage 
the number of parent agents present in a system. The parent monitor is meant to 
have parallels to the queen ant in a colony. The two types of agents are meant to 
be inspired by regular worker ants. A combination of the influence of each of the 
load agents was able to produce valid routing tables for the simulated networks. 
Nodes which started to become congested had calls routed around them and the 
load became more evenly spread throughout the network. However, this model 
is very loosely inspired by an ant colony, and only in the sense the the work is 
distributed and there is no direct communication between the individual agents. 

Schoonderwoerd, et al’s work built on the 1994 work by Steward and Ap- 
pleby. A set of pheromone tables was used to record a probability of an ant 
choosing a particular path to a final destination point. These tables are updated 
by an ant as it successfully completes a tour. For each timestep, a set of ants 
are launched on the network from random starting nodes and are given an arbi- 
trary destination node. Through the pheromone tables, they explore and update, 
finding appropriate paths through the network. The simulated calls can then be 
applied to the network, choosing the routing of the call by picking the path with 
the highest probability in the pheromone tables. This ant based routing algo- 
rithm was compared against results obtained with the agent based (but still ant 
inspired) approach in [13] and was found to significantly decrease the percentage 
of call failures on the simulated networks. 



Ant Inspired Algorithms for Optimisation Problems. AGO algorithms 
have been applied successfully to several other types of problems including ve- 




368 



E. Dicke et al. 



hide routing [4], university course timetabling [11], the quadratic assignment 
problem [8] and routing in communications networks [10], [13], [1]. Several types 
of ant inspired algorithms have been used to optimise a given set of variables. 
Birattari et al. [2] encapsulate a large amount of these with the term Ant Pro- 
gramming. A popular subset of these algorithms are the Ant Colony Optimisation 
(AGO) algorithms, of which Dorigo’s Ant Systems(AS) algorithm is one [7], [3]. 
The benefits of AGO algorithms were originally illustrated with the Traveling 
Salesman Problem (TSP) . Initial research showed that specifically AS algorithms 
would work effectively for small TSP problems. However once the size of the prob- 
lem increases to realistic sizes, the algorithm’s performance decreases and other 
techniques (both stochastic and heuristic) out-perform the AGO algorithm. 

Much work has been conducted since, to try to improve performance with 
AGO techniques [6]. Changing some parameters and update methods succeeds 
in improving the results obtained. For tested traveling salesman problems of 
discrete sizes between 43 and 783, the maximum percent error difference between 
a genetic algorithm (GA) and an improved AGO implementation is 0.4%, not 
resulting in a significant difference between the performence of the GA and the 
enhanced AGO implementation. 

TSP differs from the routing/topology problem presented here in one signifi- 
cant way. For the TSP problem, the network topology is given as an input. With 
SAN design the network topology is unknown and is determined as part of the 
design/flow routing process. 

The TSP implementation does not permit construction of invalid paths. How- 
ever, work reported in [12] uses a version of the AGO to solve a generic constraint 
satisfaction problem represented as the optimisation of a set of variables. In this 
case the goal is to find a feasible solution, when there are many possible infeasi- 
ble solutions. In this case there is no difference in terms of quality between the 
varying feasible solutions. This implementation also includes a local search com- 
ponent surrounding the solutions produced by each ant. Results show that the 
algorithm described performs favourably against a random walk based algorithm 
when the number of variables to optimise is high. 

The research discussed above shows that AGO techniques specifically can be 
useful for problems that require minimisation of some value and/or those whose 
solutions require certain conditions to be met. Since each of these has parallels 
in the SAN design problem, we have also experimented with the use of an AGO 
based algorithm to create data flow routes and SAN topology. The specifics of 
the algorithm used, and the resulting networks are discussed in the following 
section. 



3 How Biological Ants Find a Shortest Path 

The inspiration for Ant Colony Optimisation and related algorithms comes from 
knowledge of path selection in ant species using pheromone trails, especially 
Linepithema humile. This Argentinian ant deposits some amount of pheromone 
(a chemical whose concentration is recognisable by other ants of the same species) 
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as it walks. When facing a choice of routes, an ant will take the path which 
has a higher concentration of pheromone. That is, the path which has had the 
largest number of ants traveling on it previously. If multiple paths have the 
same amount of pheromone, each of the equivalent paths is chosen with equal 
probability. Occasionally, an ant will ignore the pheromone concentrations, and 
simply pick a path randomly. It is generally accepted that pheromone evaporates 
at some rate over time. 

These simple rules result in the ‘emergent behaviour’ of a colony finding the 
shortest path between two points. This is easily shown with a commonly cited set 
of illustrations (from [10]) of four ants traveling between a nest and food source. 
In Figure 1(a), we can think of ants 1 and 2 as departing from the nest, and ants 
3 and 4 as departing from the food source. In each case, at the start, there is no 
pheromone on either of the possible paths. Therefore each ant chooses a route 
randomly, each with equal probability. Ants 1 and 3 choose the upper path (A) 
and 2 and 4 choose the lower path (B). Some time later, the positions of each 
of the ants and their corresponding pheromone trails are shown in Figure 1(b). 
Since they have chosen the shorter path, ants 2 and 4 have reached their goal. 
Meanwhile, ants 1 and 3 are still on their way. Any ant that departs from either 
the food source or the nest will, at this point in time, more likely choose path 
B over A, as there is greater concentration of pheromone on the lower path. A 
majority of ants will then use the shortest route from the nest to the food source. 
The small number of ants that ignore the pheromone result in the exploration of 
new, potentially better routes which can help when a formerly optimal route has 
been blocked, or a new, potentially better route has been created. Although not 
every ant finds the optimal path, overall the colony tends to find the shortest 
route and is able to adapt to changing circumstances. 




Fig. 1. An illustration of pheromone trail following in ants. As in [10] 



4 Practical Implementation 

The above described colony rules can be adapted to any other problem which can 
be translated into a shortest path problem. Several other non-SAN related ant- 
inspired implementations have been mentioned previously in Chapter 2. Once we 
can find a representation of the given problem which corresponds to a shortest 
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path problem, we can use a modification of the method used by a Linepithema 
humile to find the shortest path within our problem. 

In general, previous work with AGO algorithms and particularly Ant System 
(AS algorithms) (i.e. [7], [6], [12], [8], [4], [11], etc) has taken the following ap- 
proach. First, we create a colony of ants, each of which chooses a path through 
some space. The elementary steps through this space are routes. An ant’s path 
is the sum of all the routes it has taken. An ant picks a particular route based 
on a weighted combination of the heuristic desirability and pheromone concen- 
tration of that particular route. Once each ant in the colony (or generation) has 
chosen its path, the pheromone concentration for each route is adjusted based 
on the varying successes of the particular paths of the ants, and is recorded for 
use by the next generation. Generally, the heuristic desirability of a particular 
route does not change between generations, but the pheromone concentration 
on each route will. This is then repeated for some number of generations. While 
the desirability aspect of the AGO implementation is not in any way inspired by 
biological ants, the AGO algorithm does not perform as well without it [3] . 

We can translate the SAN design problem into a path optimisation problem 
in the following manner. We assume a set F of flows and a set N of fabric 
nodes^. The number of available fabric nodes jA^j is generally greater than the 
number of fabric nodes necessary to solve the problem. It will be possible to use 
all of the available fabric nodes, none of them, or some number in-between. We 
will limit the solutions to single-layered networks (that is, there is at most one 
fabric node between a flow’s source and destination) . A route in this context is 
an assignment of a flow feF to a fabric node neN. From the routes chosen, a 
network topology will be inferred. 

In order to choose its path, a particular ant will iterate through the set of 
required flows ordered by decreasing bandwidth requirements, choosing a fabric 
node (or direct connection) for each of the flows to pass through. As the path is 
constructed the set of possible nodes to which the next flow can be assigned is 
restricted to only those nodes which will result in a feasible network. If there are 
no feasible fabric node choices and the set of possible nodes is therefore empty, 
the ant is terminated and ignored. We can evaluate the resulting network in terms 
of cost. This step is comparable to calculating the length of a particular tour. 
At the end of a generation, each ant then updates the pheromone concentration 
values along its path according to some predetermined rule. 

The exact probability of a particular ant choosing a particular fabric node, 
for a particular flow, during a particular generation, is described in the following 
section. As a general rule, the AGO algorithms choose a particular route with a 
probability that may be expressed as a combination of both pheromone concen- 
tration T and heuristic desirability ^ as defined in Equation 1. This is adopted 
from [7] and [3] for the SAN problem. At generation t, for a flow feF, an ant 

^ We assume there are \N\ — 1 fabric nodes in the fabric node pool. The last choice 
available (generally denoted as 0) corresponds to the choice of a direct connection. 
Henceforth, when referring to fabric node choices, we will intend to include the direct 
connection among them. 
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keK chooses a fabric node neiV to connect the flow’s start and end points with 
the probability given by the following formula. 



pU^) 






( 1 ) 



We assume that we have |AT| ants in each generation t. The exponents a and 
j3 are constant for the duration of each run. We divide by the sum over all 
feasible choices N feasible C N, in order to ensure that our probabilities sum to 
one. N feasible C N denotes the set of feasible nodes, i.e. node choices that are 
consistent with the path corresponding to a buildable network. The probability 
of selecting an infeasible node is zero, and Equation 1 gives the probability of 
selecting each of the feasible nodes. If there are no feasible nodes, the ant has 
reached a ‘dead end’, and its path is terminated. 



4.1 Determining Pheromone Matrix Values 

We store the changing pheromone values in a matrix which has dimensions |E| by 
\N\. The pheromone value of a particular route (flow / assigned to a fabric node 
n) is then determined by accessing the matrix at location (/, n). Each position 
in the matrix is initialised at the start of a run with some set value. At the end of 
each generation, the pheromone levels are updated using the formulas specified 
in 2 and 3, each of which is adopted for the SAN problem from [3]. In each case p 
is some constant representing the ‘coefficient of decay’. The coefficient of decay 
corresponds to some rate of evaporation for the pheromone. A particular route’s 
pheromone concentration is only updated by a particular ant if the ant’s path 
has passed through that location. 

Equations 2 and 3 correspond to a form of elitism. We use Equation 2 for 
all routes (/, n) which have been traversed by the best ant in the generation. 
Equation 3 is used for all other routes. Dk^ is either set to a constant, or is set 
such that the increase in pheromone at that point is inversely proportional to 
the cost of the solution specified by the best ant {kb). For more details, see [5]. 
In this implementation there is no minimum or maximum for the pheromone 
values, although in some cases we will cap the value of Dk^. The determination 
of kb (and therefore Dk^) is discussed further in Section 4.3. 



Tfn{t + 1) = {I - p) * Tfn{t) + Dk, (2) 

Tfn{t+l) = {l-p)*Tf„{t) (3) 



4.2 Determining Heuristic Desirability Values 

For our purposes, we will define ^ as an wn-desirability, and choose /3 to be 
negative, so that the term in Equation 1 represents a weighted desirability. 
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We want the un-desirability to express the following ideas: 

1. Adding a fabric node early on is not necessarily bad, but adding a fabric 
node toward the end of the list of flows would probably result in too much 
left over fabric node bandwidth. 

2. If there is already a lot of unallocated fabric node bandwidth, there is no 
need for another fabric node. 

3. Port packing (for a particular host or device) is harder if lots of flow remains 
to be allocated 

4. Port packing is easier if there is lots of port bandwidth (i.e. lots of ports) 
available 

5. Port re-usability is greater if there is more available bandwidth on the fabric 
node that carries the flow we have just added 

Un-desirability ^ will be defined as a combination of the un-desirability of 
adding a particular new fabric node (C/„) and the un-desirability of adding new 
ports {Up^, Upj, or Up^) on a host, device, or fabric node respectively. The heuris- 
tic un-desirability of adding a particular new fabric node (n) is influenced by 
ideas (1) and (2) above, and is given as a formula in Equation 4. 




Here, H'’®’”(m) is the remaining bandwidth available on the already used fabric 
nodes N^sed, b{g) is the bandwidth required for flow g, and the sum X)g=/+i i® 
over flows not yet assigned. 

The un-desirability Up^, Up^, or Up^ of adding a new port, is influenced by 
ideas (3), (4), and (5) above, and is deflned in Equation 5. 

b{g) * 

^ ( 5 ) 

BZ7d_ports* E 5— (m) + C, 

meNused 

In this equation B7J7d_ports i® ®'^’^ each node of the remaining band- 
width of each of its used ports, Cd is some constant representing the cost of 
a direct connection, and AT/j, Kd, and are constants associated with each 
type of fabric that has ports. K reinforces the difference in cost associated with 
adding a switch port versus a host port versus a device port. 

The un-desirability of a new route is deflned to be one plus the sum of all 
relevant un-desirabilities. For example, if the route does not require the addition 
of any new fabric elements, then ^ = 1; if it only requires that a new host port 
is added, ^ = Up^ -I- 1; if the route requires that a new fabric node and fabric 
node port are added to the network, ^ = Un + Up^ + 1, etc. We will not include 
a monetary cost term in the heuristics. That is, there will be no distinction, in 
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terms of made between different fabric node choices because of the monetary 
cost associated with that choice. This is something that is expected to be learned 
by the system. Since only the best ant updates the pheromone table, and the 
monetary cost of the network specified by that ant is incorporated into the 
updated pheromone values, as in Equation 2, more cost effective routes are re- 
reinforced through T. For further details, see [5]. 



4.3 Finding the Best Ant in the Colony 

The best ant in a generation is determined by comparing number of port vio- 
lations, amount of over-allocated bandwidth, and monetary cost of each ant’s 
resulting network. We want to make a distinction between host or device port 
violations and fabric node port violations. This is because of the idea that it 
is harder to resolve host /device port violations than fabric node port viola- 
tions, since a fabric node generally can support more ports (8-16) than a host 
or device (2). This makes one host/device port violation more significant than 
a fabric node port violation. When two ants are compared, we first look at the 
relative number of host/device port violations. An ant specifying a network with 
a smaller number of these type of port violations is considered to be better than 
the ant with a greater number. If two ants have the same number of host /device 
port violations, then the number of fabric node port violations is examined. If 
necessary the amount of overused bandwidth is compared, followed by monetary 
cost of each of the networks. Determination of the best ant in a colony is used 
when updating the pheromone matrix using Equations 2 and 3. 

5 Experimental Results 

5.1 Test Data 

Two types of test data were used in the experiments described in this section. 
The first set of experiments used data from the Appia project. In the second set 
of experiments we used a problem set that was created specifically, although the 
generation algorithm was the same as that used to produce the original Appia 
problem set. 

The Appia project [14] has stochastically generated, a set of test cases classi- 
fied into nine distinct groups. Each test case has a possible solution, though the 
optimal solution is not known. Each group has two specific characteristics, one 
which represents the number of hosts and the number of devices, and the other 
which represents the proportional number of flows between particular hosts and 
devices. There were three possible categories of size: problems with 10 hosts 
and 10 devices, 20 hosts and 100 devices, and 50 hosts and 100 devices. The 
flows were then characterised by three labels: sparse (a few number of flows 
generally uniformly distributed across possible host-device pairs), dense (a large 
number of flows generally uniformly distributed) or clustered (a small number 
of host-device pairs carry most of the flow requirements) . 




374 



E. Dicke et al. 



This same test set was applied to the ant-inspired algorithm described above 
in order to measure its effectiveness against the more traditional algorithms 
developed and applied in the Appia project. Each grouping of sparse, clustered, 
and dense problems was numbered from 1-30. The first 10 represent problems 
whose hosts/devices have a higher maximum percentage of port saturation (i.e. 
the the proportion of the maximum bandwidth that maybe used on a particular 
host or device). 

Work published previously by the Appia project [14] and [9], described the 
algorithm which was used to generate random SAN design problems with par- 
ticular properties. Using these descriptions, a similar generator was written and 
used to produce additional sample problems. There are several variables that 
may be specified to the generator. These include size (as discussed previously), 
port saturation (the maximum percentage of capacity on each port per host or 
device), and a maximum number of flows which may be assigned to each host 
or device. The characterisation of sparse, clustered, or dense, was only made 
after the design problem has been generated. A problem set was generated that 
allowed port saturation to range from 41% to 99% and maximum flows per host 
or device to vary from 3 (roughly equivelent to a sparse network in the original 
set) to 27 (roughly equivalent to a dense network) in a set that specified 10 
hosts and 10 devices. Flows were continually created until no more were possible 
because either there was not enough spare bandwidth remaining on the hosts 
or device ports, or each host and device had reached the maximum number of 
flows permitted per host or device. 

The main differences between the first and this second second set of random 
SAN problems, lie in two factors. First, the exact specification of each network 
problem generated is known. Due to the second hand nature of the first data 
set, configuration parameters can only be determined with an educated guess. 
Secondly, while there is a smaller sample size for each type of problem, the 
variation in the two parameters (port saturation, and maximum number of flows) 
is greater. For example, in the original set there were only two port saturations, 
41% and 99%. In this second set the port saturations vary between 41% and 
99% at 5% intervals. The maximum amount of flows per host also has a greater 
amount of variance, ranging from 3 to 27 in 2 step intervals for a 10 host by 10 
device problem. 

5.2 Results 

In order to determine appropriate parameters for a and [3 several experiments 
were run on 10 by 10 networks. In each case we used elitism (as expressed via 
Equations 2 and 3) to update the pheromone matrix. The results of these initial 
experiments are not shown (see [5] for data), but it was found that generally a 
higher weighting for the heuristics over the pheromone concentration serves to 
guide the algorithm to a better solution. For the sparse problems this was not as 
clear, as there doesn’t seem to be one combination of variables that consistently 
out performs the FlowMerge results. However, when looking at the clustered 
problems, the distinction becomes more obvious and the values of a = 0.5; [3 = 
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— 1.0 or a = 0.1; /? = —1.0 consistently outperform both Appia solutions, as well 
as the other combinations of a and /3. 



Table 1. Comparison between average solution monetary costs for Ants, FlowMerge, 
and QuickBuilder Algorithms. Negative values indicate the Ants did not perform as 
well as the specific Appia algorithm 





1 Average Cost j 


min(FM,QB)-Ants 


Type 


n 


GA 


FM 


QB 


Average 


Standard Deviation 


Sparse 


14 


$58,167 


$53,194 


$60,571 


$-6,710 


$10,169 


Clustered 


15 


$49,339 


$71,339 


$79,699 


$19,789 


$16,949 


Dense 


7 


$86,499 


$143,514 


$111,823 


$18,627 


$28,158 




-60 -40 -20 0 20 40 60 

% improvement 



Fig. 2. Percent improvement of ants solutions over lesser of Appia FlowMerge or Quick- 
Builder solutions after 1500 generations. Data is only presented for those problems 
which were valid solutions, a = 0.5 and /3 = —1.0 



In both the sparse and dense computations, there were some cases in which a 
buildable solution was not found in the maximum number of generations (1500, 
with a population size of 10). Viable solutions were more likely to occur in the 
dense experiments when there was more weighting given to the heuristics (i.e. 
a higher absolute (3 value). Although these variable settings (|/3| > a) resulted 
in one un- viable solution for the sparse problems. These results are contrary to 
those presented in [11], which show that the heuristics to not have an important 
role to play in the production of good solutions for the university timetabling 
problem. The importance of the heuristics in this case may be more important. 
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due to the nature of the search space and the limited number of buildable solu- 
tions. 

The results of 15 sparse problems, 15 clustered problems, and 15 dense prob- 
lems with constants a = 0.5; /3 = —1.0, are shown in Table 1. Solutions were 
found to all the clustered problems, but in the generations given (1500) 1 sparse 
solution, and 8 dense solutions were not found. 

Figure 2 shows the percent improvement of the ants algorithm over the 
cheaper of the two Appia algorithms. Although the sparse problem solutions 
did not generally beat the Appia solutions, the clustered solutions always did. 
The dense solutions were better than the Appia solutions in four of the seven 
viable solutions. 

The AGO algorithm’s good performance on the clustered networks is proba- 
bly due to the heuristics encouragement for fabric node re-use. When we compare 
a design by the AGO implementation with a QuickBuilder architecture (Figure 
3) we see that the Appia solution uses more fabric nodes than necessary as the 
ant-inspired algorithm has routed all flows through the same switch. 




Fig. 3. On the left, SAN designed by the Appia QuickBuilder algorithm for the clus- 
tered sample problem 8. The right shows a SAN designed by the AGO algorithm. Cost 
savings here over QuickBuilder is $58,080 



5.3 Further Explorations in Ant Space 

The ant algorithms seem to perform well for easy problems. However, perfor- 
mance degrades as the problems get bigger and the algorithm has trouble finding 
buildable solutions as the solution space gets larger. In order to try and isolate 
what it is about the particular network problem that becomes ‘hard’, experi- 
ments were run with the second test set described in Section 5.1. 

Figure 4 shows the improvement of an ant based solution (with a = 0.5 and 
(3 = —1.0) over an Appia FlowMerge (or QuickBuilder) design as a proportion 
of the FlowMerge (or QuickBuilder) cost. A zero value indicates both solutions 
were equal. The variation in improvements appears to be proportional to the 
total number of flows for the problem actually produced. However, this graph 
slopes upward as the number of flows increases. As the problems get harder. 
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Fig. 4. Change in relative cost when compared against Appia FlowMerge (top) and 
QuickBuilder (bottom). The y-axis is measured as a sum of all flows present in the 
input problem. Each grid point’s z-value is a weighted average of the nearby data 
points 
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the improvement over QuickBuilder and FlowMerge results also increases. The 
reasons for this are difficult to determine with certainty without some knowledge 
of the optimal solution and the Appia algorithm’s performance compared to 
the optimum. However, the improvement in the ant’s performance over that of 
the Appia algorithms is probably due to a relative decrease in performance of 
QuickBuilder and FlowMerge. 

6 Conclusions 

The ant-based algorithm, seems to find the general area of good solutions, as 
found with other implementations of AGO. In Section 5 we have shown that 
an appropriately configured AGO implementation can outperform Appia algo- 
rithms a majority of the time. It is possible to design cost-effective storage area 
networks using an ant colony optimisation algorithm; the use of which can result 
in solutions which provide significant cost savings over Appia designed networks. 

There are several different directions in which the work presented here could 
be taken in the future. First in terms of the problem representation and solution 
space, we made a specific decision that we limit possible solutions to include only 
single-layer networks. This may prevent both the ant-based implementation from 
finding a more optimal solution. Further work may include allowing multi-layered 
networks to be explored. 

Previous AGO algorithm work (e.g. [12]) showed that local optimisation of 
intermediate ant solutions can result in better solutions. It would be interesting 
to see the effect of local search, perhaps using a genetic algorithm on the best 
ant solution in each generation. 

The ant-based algorithms, had difficulty finding buildable solutions for large 
networks. This may be related to the heuristic component used by the ants. 
Although the heuristic works well for small problems, it is possible it is not 
appropriate for larger networks. Further exploration should include measurement 
of the feasibility of the current heuristics for larger networks. 

This work has addressed only some of the important issues of SAN design. 
There are also considerations of fault tolerance. Sometimes a client will request 
fault tolerance to be built into a SAN such that each flow has two possible routes 
in case of failure in a fabric element. To become more useful, the automated GA 
design approach will need to take into account these fault tolerant properties. 
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Abstract. With the growth of computing power and the proliferation 
of broadband access to the Internet, the use of media streaming has 
become widely diffused. By using the P2P communication architecture, 
media streaming can be expected to smoothly react to changes in network 
conditions and user demands for media streams. To achieve continuous 
and scalable media streaming, we proposed scalable media search and 
retrieval methods in our previous work. However, through several simu- 
lation experiments, we have shown that an LRU (Least Recently Used) 
cache replacement algorithm cannot provide users with continuous me- 
dia play-out for unpopular media streams. In this paper, inspired by 
biological systems, we propose a new algorithm that considers the bal- 
ance between supply and demand for media streams. Through several 
simulation experiments, it has been shown that our proposed algorithm 
could improve the completeness of media play-out compared with LRU. 



1 Introduction 

With the growth of computing power and the proliferation of broadband access 
to the Internet, such as ADSL and FTTH, the use of media streaming has become 
widely diffused. A user receives a media stream from an original media server 
through the Internet and plays it out on his/her client system as it gradually 
arrives. However, on the current Internet, the major transport mechanism is still 
only the best effort service, which offers no guarantee on bandwidth, delay, and 
packet loss probability. Consequently, such a media streaming system cannot 
provide users with media streams in a continuous way. As a result, the perceived 
quality of media streams played out at the client system cannot satisfy the user’s 
demand because he or she experiences freezes, flickers, and long pauses. 

The proxy mechanism widely used in WWW systems offers low-delay deliv- 
ery of data by means of a “proxy server,” which is located near clients. The proxy 
server deposits multimedia data that have passed through it in its local buffer, 
called the “cached buffer.” Then it provides cached data to users on demand in 
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place of the original content server. By applying the proxy mechanism to stream- 
ing services, we believe that high-quality and low-delay streaming services can 
be accomplished without introducing extra load on the system [1]. However, the 
media servers and proxy servers are statically located in the network. Users dis- 
tant from those servers are still forced to retrieve a media stream over a long and 
unreliable connection to a server. If user demands on media and their locations 
in the network are known in advance, those servers can be deployed in appropri- 
ate locations. However, they cannot flexibly react to dynamic system changes, 
such as user movements and user demands for media streams. Furthermore, as 
the number of users increases, load concentration at the servers is unavoidable. 

Peer-to-Peer (P2P) is a new network paradigm designed to solve these prob- 
lems. In a P2P network, hosts, called peers, directly communicate with each other 
and exchange information without the mediation of servers. One typical exam- 
ple of P2P applications is a file-sharing system, such as Napster and Gnutella. 
Napster is one of the hybrid P2P applications. A consumer peer finds a desired 
file by sending an inquiry to a server that maintains the file information of peers. 
On the other hand, Gnutella is one of the pure P2P applications. Since there is 
no server, a consumer peer broadcasts a query message over the network to find 
a file. If a peer successfully finds a file, it retrieves the file directly from a peer 
holding the file. Thus, concentration of load on a specific point of the network 
can be avoided if files are well distributed in a P2P network. In addition, by 
selecting a peer nearby from a set of file holders, a peer can retrieve a file faster 
than a conventional client-server based file sharing. 

By using the P2P communication technique, media streaming can be ex- 
pected to flexibly react to network conditions. There have been several research 
works on P2P media streaming [2, 3,4, 5, 6]. Most of them have constructed an 
application-level multicast tree whose root is an original media server while the 
peers are intermediate nodes and leaves. Their schemes were designed for use 
in live broadcasting. Thus, they are effective when user demands are simultane- 
ous and concentrated on a specific media stream. However, when demands arise 
intermittently and peers request a variety of media streams, as in on-demand 
media streaming services, an efficient distribution tree cannot be constructed. 
Furthermore, the root of the tree, that is, a media server, is a single point of 
failure because such systems are based on the client-server architecture. 

We have proposed several methods for on-demand media streaming on pure 
P2P networks where there is no server [7] . There are several issues to resolve in 
accomplishing effective media streaming on pure P2P networks. Scalability is the 
most important among them. Since there is no server that manages information 
on peer and media locations, a peer has to find the desired media stream by itself 
by emitting a query message into the network. Other peers in the network reply 
to the query with a response message and relay the query to the neighboring 
peers. Flooding, in which a peer relays a query to every neighboring peer, is a 
powerful scheme for finding a desired media stream in a P2P network. However, 
it has been pointed out that the flooding scheme lacks scalability because the 
number of queries that a peer receives significantly increases with the growth in 
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the number of peers [8] . Especially when a media stream is divided into blocks for 
efficient use of network bandwidth and cache buffer [9,10,11,12], a block-by-block 
search by flooding queries apparently introduces much load on the network and 
causes congestion. To tackle this problem, we proposed two scalable per-block 
search methods. Taking into account the temporal order of reference to video 
blocks, a peer sends a query message for a group of consecutive blocks. The 
range of search is dynamically regulated based on the preceding search result 
using two different algorithms. 

Another problem in P2P streaming is continuity of media play-out. In media 
streaming services, continuous media play-out is the most important factor for 
users. To accomplish the continuity of media play-out, we have to consider a 
deadline of retrieval for each block. To retrieve a block by its corresponding play- 
out time, we have proposed methods to determine an appropriate provider peer 
(i.e., a peer having a cached block) from search results by taking into account the 
network conditions, such as the available bandwidth and the transfer delay. By 
retrieving a block as fast as possible, the remaining time can be used to retrieve 
the succeeding blocks from distant peers. 

Through several simulation experiments, we have shown that our mechanisms 
can accomplish continuous media play-out for popular media streams without 
introducing extra load on the system. However, we also have found that the 
completeness of media play-out deteriorates as the media popularity decreases. 
The main reason for the deterioration is the cache replacement algorithm. In our 
media streaming system, a peer stores retrieved media data into its cache buffer. 
If there is no room to store the media data, the peer has to perform a replacement 
on cached media data with the newly retrieved media data. Although LRU is a 
simple and widely used cache replacement algorithm, it has been proved to fail 
in continuous media play-out [7]. The reason is that popular media streams are 
cached excessively while unpopular media streams eventually disappear from the 
network. To improve the continuity of media play-out, in this paper, we propose 
an effective cache replacement algorithm that considers the supply and demand 
for media streams. Since there is no server, a peer has to conjecture the behavior 
of other peers by itself. We expect that each peer can assign an appropriate 
media stream to be replaced based on its local information and, as a result, that 
each media stream can be cached according to its corresponding popularity in 
the network. Thus, our proposed media streaming is one of distributed systems 
constructed by peers. 

In biology, social insects, such as ants, also construct a distributed sys- 
tem [13]. In spite of the simplicity of their individuals, the social insect society 
presents a highly structured organization. As a result, the organization can ac- 
complish complex tasks that in some cases far exceed the individual capacities of 
a single insect. It has been pointed out that the study of social insect societies’ be- 
haviors and their self-organizing capacities is interesting for computer scientists 
because it provides models of distributed organization that are useful to solve 
difficult optimization and distributed control problems. Therefore, we can ex- 
pect that a bio-inspired mechanism would be applicable to our media-streaming 
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system by regarding the peer as insect. In particular, a recently proposed model 
of division of labor in a colony of primitively eusocial wasps, based on a simple 
reinforcement of response thresholds, can be transformed into a decentralized 
adaptive algorithm of task allocation [14]. 

In the model of the division of labor, the ratio of individuals that perform a 

task is adjusted in a fully-distributed and self-organizing manner. The demand 

to perform a task increases as time passes and decreases when it is performed. 

The probability that an individual i performs a task is given by the demand, i.e., 

2 

stimulus s, and the response threshold 9i as ^ 2 ^g 2 , for example. When individual 
i performs the task, the threshold to the task is decreased and thus this individ- 
ual tends to devote itself to the task. After performing the task several times, it 
becomes a specialist of the task. Otherwise, the threshold is increased. Through 
threshold adaptation without direct interactions among individuals, the ratio of 
individuals that perform a specific task is eventually adjusted to some appro- 
priate level. As a result, there would be two distinct groups that show different 
behaviors toward the task, i.e., one performing the task and the other hesitating 
to perform the task. When individuals performing the task are withdrawn, the 
associated demand increases and so does the intensity of the stimulus. Eventu- 
ally, the stimulus reaches the response thresholds of the individuals of the other 
group, i.e., those that are not specialized for that task. Some of these individuals 
are stimulated to perform the task, their thresholds decrease, and finally they 
become specialized for the task. Finally, the ratio of individuals with regard to 
task allocations reaches the appropriate level again. 

By regarding the replacement of media streams as a task, we propose a bio- 
inspired cache replacement algorithm based on the division of labor and task 
allocation. We employ the ratio of supply to demand for a media stream as 
stimulus. Since each peer relays the query and response messages to its neigh- 
boring peers, it can passively obtain information on supply and demand without 
introducing extra signaling traffic on the system. It estimates the demand for a 
media stream from the number of queries for the media stream received from 
other peers and the supply for a media stream from the number of them included 
in the response messages. Then, based on the stimulus, it assigns a media stream 
to be replaced in a probabilistic way. The selected media stream is replaced in 
a block-by-block basis from the end of the media stream. Since the threshold 
of the victim is decreased, a media stream tends to be discarded often and se- 
quentially once it is chosen as a victim. Although a deterministic approach can 
also be applicable to this problem, it can not perform effectively without com- 
plex parameter settings. On the other hand, our proposed cache replacement 
algorithm is parameter-insensitive since each peer dynamically changes the re- 
sponse threshold in accordance with the information obtained from the network 
environment. 

Through several simulation experiments, we evaluated the proposed cache 
replacement algorithm in terms of the completeness of media play-out and the 
insensitivity to parameter setting. 
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The rest of the paper is organized as follows. In Section 2, we discuss our 
media streaming on P2P networks. We give an overview of our streaming sys- 
tem on P2P networks, that is, per-group based search and retrieval of media 
streams in Subsection 2.1. Then, we introduce search and retrieval methods to 
accomplish scalable and continuous media streaming in Subsections 2.2 and 2.3. 
Furthermore, we propose a bio-inspired cache replacement algorithm in Subsec- 
tion 2.4. Next, in Section 3, we evaluate our proposed cache replacement algo- 
rithm through several simulation experiments. Finally, we conclude the paper 
and describe future works in Section 4. 



2 Media Streaming on P2P Networks 

A peer participating in our system first joins a logical P2P network for the 
media streaming. For efficient use of network bandwidth and cache buffer, a 
media stream is divided into blocks. A peer searches, retrieves, and stores a 
media stream in a block-by-block basis. In this section, we introduce scalable 
search methods to find desired blocks and algorithms to determine an optimum 
provider peer from the search results. Finally, a bio-inspired cache replacement 
algorithm that takes into account the balance between supply and demand for 
media streams is given. 

2.1 Per-group Based Block Search 

In our system, a peer retrieves a media stream and plays it out in a block-by- 
block basis. However, a block-by-block search apparently increases the number 
of queries that are transferred on the network and causes network congestion. 
To tackle this problem, taking into account the temporal order of reference in 
a media stream, our mechanism employs a per-group search to accomplish a 
scalable media search based on the number of peers. 

A peer sends out a query message for every N consecutive blocks, called a 
round. Figure 1 illustrates an example of = 4. Pa, Pb, Pc, and Pd, which 
indicate peers within the range of the propagation of query messages. Numbers 
in parentheses next to peers stand for identifiers of the blocks that a peer has. 
At time Ti(l), a query message for blocks from 1 to 4 is sent out from P to the 
closest peer Pa- Since Pa has the second block out of four requested blocks, it 
returns a response message. It also relays the query to the neighboring peers Pb 
and Pc- Pb also replies with a response message to P- Since Pq does not have 
any of the four blocks, it only relays the query to Pd- Finally, Pd sends back a 
response message. P determines a provider peer for each block in the round from 
the search results obtained by the query. It takes two RTT (Round Trip Time) 
from the beginning of the search to the start of reception of the first block of the 
round. To accomplish continuous media play-out, P sends a query for the next 
round at a time that is 2RTT,j„orst earlier than the start time of the next round. 
RTTworst is the RTT to the most distant peer among the peers that returned 
response messages in the current round. 
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Roundl Blockl Block2 Block3 Block4 




Request (upward) and Transmit (downward) — ► 
Fig. 1. Example of scheduling for search and retrieval 



2.2 Block Search Methods Based on Search Results 

Since each peer retrieves a media stream sequentially from the beginning to 
the end, we can expect that a peer that sent back a responses message for the 
current round has some blocks of the next round. In our mechanisms, a peer 
tries flooding at the first round. However, in the following rounds, it searches 
blocks in a scalable way based on the search results of the previous round. 

A query message consists of a query identifier, a media identifier, and a pair of 
block identifiers to specify the range of blocks needed, i.e., (1, N), a time stamp, 
and TTL (Time To Live). A peer that has any blocks in the specified range sends 
back a response message. A response message reaches the querying peer through 
the same path, but in the reversed direction, that the query message traversed. 
The response message contains a list of all cached blocks, TTL values stored 
in the received query, and sum of the time stamp in the query and processing 
time of the query. Each entry of the block list consists of a media identifier, a 
block number, and block size. If TTL is zero, the query message is discarded. 
Otherwise, after decreasing the TTL by one, the query message is relayed to 
neighboring peers except for the one from which it received the query. In the 
case of Gnutella, a fixed TTL of seven is used. By regulating TTL, the load of 
finding a file can be reduced. We have called the flooding scheme with a fixed 
TTL of 7, which is used in Gnutella, full flooding, and that with a limited TTL 
based on the search results, limited flooding. 

In limited flooding, for the kth. round, a peer obtains a set R of peers based 
on response messages obtained at round fc — 1. A peer in R is expected to have 
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at least one of the blocks from kN + 1 to (fc + 1) Since time has passed from 
the search at round k — 1, some blocks listed in a response message may be 
already replaced by other blocks. Assuming that a peer is watching a media 
stream without interactions such as rewinding, pausing, and fast-forwarding, 
and that the cache buffer is filled with blocks, the number of blocks removed can 
be estimated by dividing the elapsed time from the generation of the response 
message by one block time Bt- We should note here that we do not take into 
account blocks cached after a response message is generated. In limited flooding, 
TTL is set to that of the most distant peer among the peers in R. 

To attain an even more efficient search, we have also proposed another search 
scheme called selective search. The purpose of flooding schemes is to And peers 
that do not have any blocks of the current round but do have some blocks 
of the next round. Flooding also finds peers that have newly joined our system. 
However, in flooding, the number of queries relayed on the network exponentially 
increases according to the TTL and the number of neighboring peers. As a simple 
example, when a query is given a TTL whose value is H and a peer knows D 
other peers, the total number of query messages relayed on the network becomes 

H 

— 1)* = 0{{D — Therefore, when a sufficient number of peers are 

i—1 

expected to have blocks in the next round, it is effective for a peer to directly 
send queries to those peers. We call this selective search. 

By considering the pros and cons of full flooding, limited flooding, and selec- 
tive search, we have proposed efficient methods based on combining them. 

FL method 

The FL method is a combination of full flooding and limited flooding. For 
blocks of the next round, a peer conducts (1) limited flooding if the con- 
jectured contents of cache buffers of peers in R satisfy every block of the 
next round, or (2) full flooding if one or more blocks cannot be found in the 
conjectured cache contents of peers in R. 

FLS method 

The FLS method is a combination of full flooding, limited flooding, and 
selective search. For the next round’s blocks, a peer conducts (1) selective 
search if the conjectured contents of cache buffers of peers in R contain every 
block of the next round, (2) limited flooding if any one of the next round’s 
blocks cannot be found in the conjectured cache contents of peers in R, or. 
Anally, (3) full flooding if none of the provider peers it knows is expected to 
have any block of the next round, i.e., R = <j). 



2.3 Block Retrieval Methods Considering the Continuity of Media 
Play-Out 

The peer sends a request message for the first block of a media stream as soon 
as it receives a response message from a peer that has the block. Then, it plays 
it out immediately when the reception of the block starts. Consequently, the 
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deadlines for the retrieval of succeeding blocks j > 2, Tp{j) are determined as 
follows: 



Tp{j) = Tp{l) + {j - l)Bt, (1) 

where Tp{l) corresponds to the time that the peer finishes playing out the first 
block and Bt stands for one block time. 

We do not wait for the completion of the reception of the preceding block 
before issuing a request for the next block because it introduces an extra delay of 
at least one RTT, and the cumulative delay affects the timeliness and continuity 
of media play-out. In our block retrieval mechanism, the peer sends a request 
message for block j at Tr{j). As a result, the peer can start to receive block j 
just after finishing the retrieval of block j — 1, as shown in Fig. 1. Equation (1) 
guarantees that the completion time of a block retrieval is earlier than that of 
the block play-out. Furthermore, the retrieval of the next block starts after the 
completion of the retrieval of the previous block. As a result, our block retrieval 
mechanism can maintain the continuity of media play-out. By observing the 
way that the response message is received in regard to the query message, the 
peer estimates the available bandwidth and the transfer delay from the provider 
peer. The estimates are updated through reception of media data. For more 
precise estimation, we can use any other measurement tool as long as it does not 
disturb media streaming. Every time the peer receives a response message, it 
derives the estimated completion time of retrieval of block j, that is from 

the block size and the estimated bandwidth and delay, for each block for which 
it has not sent a request message yet. Then, it determines an appropriate peer 
in accordance with deadline Tp(j) and calculates time Tr{j) at which it sends a 
request. 

In the provider determination algorithm, the peer calculates set Sj, a set 
of peers having block j. Next, based on the estimation of available bandwidth 
and transfer delay, it derives set £7, a set of peers from which it can retrieve 

block j by deadline Tp{j), from Sj. If Sj = 4>, the peer waits for the arrival of 
the next response message. However, it gives up retrieving and playing block j 
when the deadline Tp(j) passes without finding any appropriate peer. To achieve 
continuous media play-out, it is desirable to shorten the block retrieval time. The 
SF (Select Fastest) method selects a peer whose estimated completion time is 
the smallest among those in Sj. By retrieving block j as fast as possible, the 
remainder of Tp(j) — Tf(j) can be used to retrieve the succeeding blocks from 
distant peers. On the other hand, an unexpected cache miss introduces extra 
delay on the client system. The SR (Select Reliable) method selects the peer 
with the lowest possibility of block disappearance among those in Sj. As a result, 
this suppresses block disappearance before a request for block j arrives at the 
provider peer. 

2.4 Bio-inspired Cache Replacement Algorithm 

Since the cache buffer size is limited, there may be no room to store a newly 
retrieved block into the cache. Although LRU is a simple and widely used scheme. 
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it has been shown that LRU cannot accomplish continuous media play-out under 
a heterogeneous media popularity [7]. This is because popular media streams are 
cached excessively while unpopular media streams eventually disappear from the 
P2P network. 

In this paper, to solve this problem, we propose a bio-inspired cache re- 
placement algorithm that considers the balance between supply and demand for 
media streams. Since there is no server in a pure P2P network, a peer has to 
conjecture the behavior of other peers by itself. It is disadvantageous for the peer 
to aggressively collect information about supply and demand by communicating 
with other peers, since this brings extra load on the system and deteriorates 
the system scalability. Therefore, in our scheme, a peer estimates the supply 
and demand based on locally available and passively obtained information. This 
information consists of search results it obtained and messages it relayed. We 
expect that each peer can determine an appropriate media stream to be re- 
placed based on its local information, and, as a result, each media stream can 
be cached according to the media popularity in the network. Thus, our proposed 
media streaming is a distributed system constructed by peers. In biology, social 
insects, such as ants, also form a decentralized system. Furthermore, it has been 
pointed out that social insects provide us with a powerful metaphor for cre- 
ating decentralized systems of simple interacting [13]. In particular, a recently 
proposed model of division of labor in a colony of primitively eusocial wasps, 
based on a simple reinforcement of response thresholds, can be transformed into 
a decentralized adaptive algorithm of task allocation [14]. 

In the model of the division of labor, the ratio of individuals that perform a 

task is adjusted in a fully-distributed and self-organizing manner. The demand 

to perform a task increases as time passes and decreases when it is performed. 

The probability that an individual i performs a task is given by the demand, 

2 

i.e., stimulus s, and the response threshold 9i as for example. When the 

individual i performs the task, the threshold to the task is decreased and thus 
it tends to devote itself to the task. After performing the task several times, it 
becomes a specialist of the task. Otherwise, the threshold is increased. Through 
threshold adaptation without direct interactions among individuals, the ratio of 
individuals that perform a specific task is eventually adjusted to some appropri- 
ate level. As a result, there are two distinct groups that show different behaviors 
toward the task, i.e., one performing the task and the other hesitating to perform 
the task. When individuals performing the task are withdrawn, the associated 
demand increases and so does the intensity of the stimulus. Eventually, the stim- 
ulus reaches the response thresholds of the individuals in the other group, i.e., 
those not specialized for that task. Some individuals are stimulated to perform 
the task, their thresholds decrease, and finally they become specialized for the 
task. Finally, the ratio of individuals with regard to task allocations reaches the 
appropriate level again. 

By regarding the replacement of media streams as a task, we propose a cache 
replacement algorithm based on the division of labor and task allocation model. 
Compared with LRU, our proposed cache replacement algorithm can flexibly 
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adapt to the temporal changes of supply and demand for media streams. The 
proposed algorithm is organized by following two steps. 

Step 1. Estimate the supply and demand for media streams per round. A peer 
calculates supply S(i) and demand D{i) for media stream i from search 
results it received and query and response messages it relayed at the previous 
round. S{i) is the number of fully cached stream i in response messages that 
it received and relayed. Here, to avoid the overlap of calculation, S{i) includes 
only the response messages for media stream i as search results. D{i) is the 
number of the query messages for media stream i, which the peer emitted 
by itself. To adapt to the temporal changes of supply and demand for media 
streams, we use the moving average as follows. 

S{i) = WsS{i) + (1 - Ws)S{i), {0 <Ws < 1) (2) 

D{i) = WdD{i) + (1 - Wd)D{i), {0 < Wd < 1) (3) 



Step 2. Determine a candidate media stream for replacement. Based on the 
“division of labor and task allocation”, we define probability Pr{i) of re- 
placement of media stream i as follows: 



Pr{i) 



S2(i) + 02(j)’ 



( 4 ) 



where s{i) indicates the ratio of supply to demand for media stream i after 
replacement of media stream i, that is, max^^=^,0^. s(z) means how 
excessively media stream i exists in the network if it was replaced. Therefore, 
by discarding a media stream whose s(z) is large, we can expect that the 
supply and demand becomes the same among streams in a P2P network. 
A peer does not discard the media stream that it is currently watching. To 
shorten the waiting time for media play-out, it is better if mostly the former 
part of a media stream exists. Therefore, a peer replaces a media stream 
in a block-by-block basis from the end of the media stream. As in [15], the 
threshold of a media stream that a peer works on is decreased. As a result, a 
media stream tends to be discarded often and sequentially once it is chosen 
as a victim. 



OU) 



0{j) - C if J = * 

0{j) + ip iij^i 



( 5 ) 



By sequentially replacing blocks of the same media stream, fragmentation of 
media streams can be avoided. Since our mechanism replaces a media stream 
from the end of the media stream, the increase in the fragmentation leads 
to the disappearance of the latter part of media streams from the network. 
We expect that controlling the threshold is one way to solve this problem. 
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Fig. 2. Random network with 100 peers 



3 Simulation Evaluation 

We have already evaluated our proposed search and retrieval methods and have 
shown that the FLS method can accomplish continuous media play-out with a 
smaller amount of search traffic compared with full flooding in [7]. In this section, 
we conduct simulation experiments to evaluate our proposed cache replacement 
algorithm in terms of the completeness of media play-out and insensitivity to 
parameter setting. 

3.1 Simulation Model 

We use P2P logical networks with 100 peers randomly generated by the Waxman 
algorithm [16] with parameters a = 0.15 and j3 = 0.3. An example of generated 
networks is shown in Fig. 2. The round trip time between two contiguous peers is 
also determined by the Waxman algorithm and ranges from 10 ms to 660 ms. To 
investigate the ideal characteristics of our proposed mechanisms, the available 
bandwidth between two arbitrary peers does not change during a simulation 
experiment. The bandwidth is given at random between 500 kbps and 600 kbps, 
which is larger than the media coding rate of CBR 500 kbps. 

At first, all 100 peers participate in the system, but no peer watches a media 
stream. One peer begins to request a media stream at a randomly determined 
time. The inter-arrival time between two successive requests for the first media 
stream follows an exponential distribution whose average is 20 minutes. Forty 
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media streams of 60-minute length are available. Media streams are numbered 
from 1 (the most popular) to 40 (the least popular), where popularity follows 
a Zipf-like distribution with a = 1.0. Therefore, media stream 1 is forty times 
more popular than media stream 40. Each peer watches a media stream without 
interactions such as rewinding, pausing, and fast-forwarding. When a peer fin- 
ishes watching a media stream, it becomes idle during the waiting time, which 
also follows an exponential distribution whose average is 20 minutes. 

A media stream is divided into blocks of 10-sec duration and amounts to 
625 KBytes. Each peer sends a query message for a succession of six blocks, 
i.e., N = Q, and retrieves blocks. Obtained blocks are deposited into a cache 
buffer of 675 MB, which corresponds to three media streams. In the first run of 
the simulation, each peer stores three whole media streams in its cache buffer. 
The population of each media stream in the network also follows a Zipf-like 
distribution whose parameter a is 1.0. We set the parameters of moving average, 
Ws and Wd, to 0.9, respectively. Based on the values used in [13], we set the 
parameters of the cache replacement algorithm as follows: ^ = 10, = 1, and the 

initial value of 9{i) is set to 500 and 9{i) changes between 1 and 1000. To prevent 
the initial condition of the cache buffer from influencing system performance, we 
only use the results after the initially cached blocks are completely replaced with 
newly retrieved blocks for all peers. 

Since there is almost no difference in simulation results among the six com- 
binations of search methods and block retrieval methods in our experiments, we 
only show the results of the combination of the ELS and SF methods. We show 
the average values of 60 set of simulations in the following figures. We define 
the waiting time as the time between the emission of the first query message 
for the media stream and the beginning of reception of the first block. Through 
simulation experiments, we observe that, independent of method, the waiting 
time decreases as the popularity increases. However, independent of popularity, 
all media streams successfully found can be played out within 2.6 sec. 

3.2 Evaluation of Completeness of Media Play-Out 

To evaluate the completeness of media play-out, we define completeness as the 
ratio of the number of retrieved blocks in time to the number of blocks in a 
media stream. Figures 3 and 4 illustrate the completeness with a 95% confi- 
dence interval of each media stream. The horizontal axis indicates the media 
popularity that decreases with the growth of the media identifier. Comparing 
Fig. 3 with Fig. 4, we find that our proposed algorithm can reduce the decrease 
in completeness to the deterioration of the media popularity. As a result, for 
unpopular media streams, the completeness of the proposed algorithm is higher 
than that of LRU by 0.2 at most. On the other hand, our proposed algorithm 
slightly deteriorates the completeness for popular media streams compared with 
the performance of LRU. 

Since a media stream is selected based on the Zipf-like distribution, the com- 
pleteness of popular media streams is more important than that of unpopular 
media streams, in terms of the total degree of user satisfaction. Here, we define 
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Fig. 3. Completeness (LRU) 
Table 1. Degree of user satisfaction 





FLS method 


LRU 


3.870 


Proposed algorithm 


3.855 



weighted completeness, which means the completeness weighted with the media 
popularity, as follows. 



W{t) = C{i) X i, (6) 

where C(i) is completeness of media stream i. Furthermore, we define the user 
satisfaction as W{i). Table 1 shows the user satisfaction of Figs. 3 and 4. As 

i 

shown in Tab. 1, there is almost no difference between LRU and the proposed 
algorithm. Therefore, we can conclude that our proposed algorithm can accom- 
plish high completeness even for unpopular media streams without deteriorating 
the total of the degree of user satisfaction. 

3.3 Evaluation of Insensitivity to Parameter Setting 

Parameter setting is a common problem in network control mechanisms. It has 
been pointed out that it is difficult to select an appropriate parameter statically. 
To solve this problem, the division of labor and task allocation dynamically 
changes the response threshold in accordance with the information obtained 
from the network environment. As a result, it can flexibly adapt to diverse net- 
work environments without a specific parameter setting. Figure 5 illustrates the 
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Fig. 4. Completeness (proposed cache replacement algorithm) 



completeness of the proposed cache replacement algorithm with normalized pa- 
rameters: ^ = 0.01, = 0.001. The initial value of 0{i) is set to 0.5, and 6{i) 

changes between 0.001 to 1. Furthermore, to make the weight of s{i) and 9{i) 
in Eq. (4) uniform, s{i) is normalized by dividing by s(*)- Comparing Fig. 4 

I 

with Fig. 5, we find that there is almost no difference between them. Therefore, 
we can conclude that our proposed cache replacement algorithm is insensitive to 
parameter setting. 



4 Conclusions 

In this paper, to accomplish scalable and continuous media streaming on P2P 
networks, we introduced two scalable search methods and two algorithms for 
block retrieval and proposed a bio-inspired cache replacement algorithm that 
considers the balance between supply and demand for media streams. Through 
several simulation experiments, we have shown that our proposed cache replace- 
ment algorithm can accomplish continuous media play-out independent of me- 
dia popularity. Furthermore, our simulation results show that proposed cache 
replacement algorithm is not sensitive to parameter setting. 

As future research work, we should evaluate our proposed mechanisms in 
more realistic situations where network conditions dynamically change and peers 
randomly join and leave the system. 
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Abstract. In mobile ad-hoc networks, nodes act both as terminals and informa- 
tion relays, and participate in a common routing protocol, such as Dynamic Source 
Routing (DSR). The network is vulnerable to routing misbehavior, due to faulty 
or malicious nodes. Misbehavior detection systems aim at removing this vulner- 
ability. In this paper we investigate the use of an Artificial Immune System (AIS) 
to detect node misbehavior in a mobile ad-hoc network using DSR. The system 
is inspired by the natural immune system of vertebrates. Our goal is to build a 
system that, like its natural counterpart, automatically learns and detects new mis- 
behavior. We describe the first step of our design; it employs negative selection, 
an algorithm used by the natural immune system. We define how we map the 
natural immune system concepts such as self, antigen and antibody to a mobile 
ad-hoc network, and give the resulting algorithm for misbehavior detection. We 
implemented the system in the network simulator Glomosim; we present detection 
results and discuss how the system parameters impact the results. Further steps 
will extend the design by using an analogy to the innate system, danger signals, 
costimulation and memory cells. 



1 Introduction 

1.1 Problem Statement: Detecting Misbehaving Nodes in DSR 

Mobile ad-hoc networks are self organized networks without any infrastructure other 
than end user terminals equipped with radios. Communication beyond the transmission 
range is made possible by having all nodes act both as terminals and information relays. 
This in turn requires that all nodes participate in a common routing protocol, such as 
Dynamic Source Routing (DSR) [16]. A problem is that DSR works well only if all 
nodes execute the protocol correctly, which is difficult to guarantee in an open ad-hoc 
environment. 

A possible reason for node misbehavior is faulty software or hardware. In classical 
(non ad-hoc) networks run by operators, equipment malfunction is known to be an 
important source of unavailability [ 1 7] . In an ad-hoc network, where routing is performed 
by user provided equipment, we expect the problem to be exacerbated. Another reason 
for misbehavior stems from the desire to save battery power: some nodes may run a 
modified code fhaf pretends to participate in DSR but, for example, does not forward 
packets. Last, some nodes may also be truly malicious and attempt to bring the network 
down, as do Internet viruses and worms. An extensive list of such misbehavior is given 
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in [2]. The main operation of DSR is described in Section 2.1. In our simulation, we 
implement a faulty node that, from time to time, does not forward data or route requests, 
or does not respond to route requests from its own cache. 

We consider the problem of detecting nodes that do not correctly execute the DSR 
protocol. The actions taken after detecting that a node misbehaves range from forbidding 
to use the node as a relay [1] to excluding the node entirely from any participation in the 
network [3]. In this paper we focus on the detection of misbehavior and do not discuss 
actions taken after detection. 

We chose DSR as a concrete example, because it is one of the protocols being 
considered for standardization for mobile ad-hoc networks. There are other routing 
protocols, and there are parts of mobile ad-hoc networks other than routing that need 
misbehavior detection, for example, the medium access control protocol. We believe the 
main elements of our method would also apply there, but a detailed analysis is for further 
work. 



1.2 Traditional Misbehavior Detection Approaches 

Traditional approaches to misbehavior detection [1,3] use the knowledge of anticipated 
misbehavior patterns and detect them by looking for specific sequences of events. This 
is very efficient when the targeted misbehavior is known in advance (at system design) 
and powerful statistical algorithms can be used [4]. 

To detect misbehavior in DSR, Buchegger and Le Boudec use a reputation system 
[3]. Every node calculates the reputation of every other node using its own first hand 
observations and second hand information obtained from others. The reputation of a 
node is used to determine whether countermeasures against the node are undertaken or 
not. A key aspect of the reputation system is how second hand information is used, in 
order to avoid false accusations [3]. 

The countermeasures against a misbehaving node are aimed to isolate it, i.e., packets 
will not be sent over the node and packets sent from the node will be ignored. In this 
way nodes are stimulated to cooperate in order to get service and maximize their utility, 
and the network also benefits from the cooperation. 

Even if not presented by its authors as an artificial immune system, the reputation 
system in [3,4] is an example of (non-bio inspired) immune system. It contains interac- 
tions between its healthy elements (well behaving nodes) and detection and exclusion 
reactions against non-healthy elements (misbehaving nodes). We can compare it to the 
natural innate immune system (Section 2.2), in the sense that it is hardwired in the nodes 
and changes only with new versions of the protocol. 

Traditional approaches miss the ability to learn about and adapt to new misbehavior. 
Every target misbehavior has to be imagined in advanced and explicitly addressed in the 
detection system. This is our motivation to use an artificial immune system approach. 



1.3 Artificial Immune System (AIS) Approaches 

An AIS uses an analogy with the natural Immune System (IS) of vertebrates. As a first 
approximation, the IS can be described with the “self, non self model, as follows (we 
give more details in Section 2.2). 
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The IS is thought to be able to classify cells that are present in the body as self and 
non-self cells. The IS is made of two distinct sets of components: the innate IS, and 
the adaptive IS. The innate IS is hard-wired to detect non self (and destroy) cells that 
contain, or do not contain, specific patterns on their surface. 

The adaptive IS is more complex. It produces a large number of randomly created 
detectors. A “negative selection" mechanism eliminates detectors that match all cells 
present in a protected environment (bone marrow and the thymus) where only self cells 
are assumed to be present. Non-eliminated detectors become “naive" detectors; they die 
after some time, unless they match something (assumed to be a pathogen), in which 
case they become memory cells. Further, detectors that do match a pathogen are quickly 
multiplied (“clonal selection"); this is used to accelerate the response to further attacks. 
Also, since the clones are not exact replicates (they are mutated, the mutation rate being 
an increasing function of affinity between detector and antigen) this provides a more 
focused response to the pathogen (“affinity maturation"). This also provides adaptation 
to a changing non-self environment. 

The self-nonself model is only a very crude approximation of the adaptive IS . Another 
important aspect is the “danger signal" model [11,12]. With this model, matching by the 
innate or adaptive mechanism is not sufficient to cause detection; an additional danger 
signal is required. The danger signal is for example generated by a cell that dies before 
being old. The danger signal model better explains how the IS adapts not only to a 
changing non-self, but also to some changes in self. There are many more aspects to the 
IS, some of which are not yet fully understood (see Section 2.2). 
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Fig. 1. From the natural IS to an AIS: Making DSR immune to node misbehavior. 



1.4 Artificial Immune Systems - Related Work 

Hofmeyer and Forrest use an AIS for intrusion detection in wired local area networks 
[5,6]. Their work is based on the negative selection part of the self-nonself model and 
some form of danger signal. TCP connections play the role of self and nonself cells. One 
connection is represented by a triplet encoding sender’s destination address, receiver’s 
destination address and receiver’s port number. A detector is a bit sequence of the same 
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length as the triplet. A detector matches a triplet if both have M contiguous bits equal, 
where M is a fixed system parameter. Candidate detectors are generated randomly; in 
a learning phase, detectors that match any correct (i.e. self) triplets are eliminated. This 
is done offline, by presenting only correct TCP connections. Non-eliminated detectors 
have a finite lifetime and die unless they match a non-self triplet, as in the IS. The 
danger signal is also used: it is sent by humans as confirmation in case of potential 
detection. This is a drawback, since human intervention is required to eliminate false 
positives, but it allows the system to learn changes in the self. With the terminology of 
statistical pattern classification, this use of the danger signal can be viewed as some form 
of supervised training. Similarly, Dasgupta and Gonzalez [20] use an AIS approach to 
intrusion detection, based on negative selection and genetic algorithms. 

A major difficulty in building an artificial immune system in our framework is the 
mapping from biological concepts to computer network elements. Kim and Bentley 
show that straightforward mappings have computational problems and lead to poor 
performance, and they introduce a more efficient representation of self and nonself than 
in [5]. They show the computational weakness of negative selection and add clonal 
selection to address this problem [8]. In their subsequent papers, they examine clonal 
selection with negative selection as an operator [9], and dynamical clonal selection [10], 
showing how different parameters impact detection results. For an overview of AIS, see 
the book by de Castro and Timmis [19] and the paper by de Castro and von Zuben [18]. 

1.5 Contribution of This Paper and Organization 

Our long term goal is to understand whether our previous work, based on the traditional 
approach [2], can benefit from an AIS approach that introduces learning and adapting 
mechanisms. This paper is our first step in this direction. 

The first problem to solve is mapping the natural IS concepts to our framework. This 
is a key issue that strongly influences the detection capabilities. We describe our solution 
in Section 3.1. For the representation of self-nonself and for the matching functions, we 
start from the general structure proposed by Kim and Bentley [8], which we adapt to 
our case. Then we define the resulting algorithm, which, in this first step, is based only 
on negative selection. Our main contribution in this phase is the definition of a mapping 
and a construction of an AIS adapted to our case, its implementation in the Glomosim 
simulator, and its performance analysis. 

The rest of the paper is organized as follows. Section 2 gives background and termi- 
nology on DSR and the natural immune system. Section 3 gives the mapping from the IS 
to the detection system for DSR misbehavior detection, and the detailed definition of the 
detection system. Section 4 gives simulation specific assumptions and constraints, sim- 
ulation results and discussion of the results. Section 5 draws conclusions and describes 
what we have learned and how we will exploit it in future steps. 



2 Background 

2.1 DSR: Basic Operations 



Dynamic source routing protocol is one of the candidate standards for routing in mobile 
ad hoc networks [16]. A “source route" is a list of nodes that can be used as interme- 
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diate relays to reach a destination. It is written in the data packet header at the source; 
intermediate relays simply look it up to determine the next hop. 

DSR specifies how sources discover, maintain and use source routes. To discover a 
source route, a node broadcasts a route request packet. Nodes that receive a route request 
add their own address in the source route collecting held of the packet, and broadcast 
the packet, except in two cases. The hrst case is if the same route request was already 
received by a node; then the node discards the packet. Two received route requests are 
considered to be the same if they belong to the same route discovery, what is identihed by 
the same value of source, destination and sequence number helds in the request packets. 
The second case is if the receiving node is destination of the route discovery, or if it 
already has a route to the destination in its cache; then the node sends a route reply 
message that contains a completed source route. If links in the network are bidirectional, 
the route replies are sent over the reversed collected routes. If links are not bidirectional, 
the route replies are sent to the initiator of the route discovery as included in a new 
route request generated by answering nodes. The new route requests will have as the 
destination the source of the initial route request. The node that initiate original route 
request gets usually more route replies, every containing a different route. The replies 
that arrive earlier then others are expected to indicate better routes, because for a node 
to send a route reply, it is required to wait hrst for a time proportional to the number 
of hops in the route it has as answer. If a node hears that some neighbor node answers 
during this waiting time, it supposes that the route it has is worse then the neighbor’s 
one, and it does not answer. This avoids route reply storms and unnecessary overhead. 

After the initiator of route discovery gets hrst route reply, it sends data over obtained 
route. While packets are sent over the route, the route is maintained, in such a way that 
every node on the route is responsible for the link over which it sends packets. If some 
link in the route breaks, the node that detects that it cannot send over that link should 
send error messages to the source. Additionally it should salvage the packets destined 
to the broken link, i.e., reroute them over alternate partial routes to the destination. 

The mechanisms just described are the basic operation of DSR. There are also some 
additional mechanisms, such as gratuitous route replies, caching routes from forwarded 
or overheard packets and DSR flow state extension [16]. 



2.2 The Natural Immune System 

The main function of the IS is to protect the body against different types of pathogens, 
such as viruses, bacteria and parasites and to clear it from debris. It consists of a large 
number of different innate and acquired immune cells, which interact in order to provide 
detection and elimination of the attackers [13]. We present a short overview based on 
the self-nonself and the danger models [13,12]. 

Functional architecture of the IS. The hrst line of defense of the body consists of physical 
barriers: skin and mucous membranes of digestive, respiratory and reproductive tracts. 
It prevents the body from being entered easily by pathogens. 

The innate immune system is the second line of defense. It protects the body against 
common bacteria, worms and some viruses, and clears it from debris. It also interacts 
with the adaptive immune system, signaling the presence of damage in self cells and 
activating the adaptive IS. 
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The adaptive immune system learns about invaders and tunes it’s detection mecha- 
nisms to better match previously unknown pathogens. It provides an effective protection 
against viruses even after they enter the body cells. It adapts to newly encountered viruses 
and memorizes them for more efficient and fast detection in the future. 

The innate immune system consists of macrophages cells, complement proteins and 
natural killer cells. Macrophages are big cells that are attracted by bacteria to engulf 
them in the process called “phagocytosis". Complement proteins can also destroy some 
common bacteria. Both macrophages and complement proteins send signals to other 
immune cells when there is an attack. 

The adaptive immune system consists of two main types of lymphocyte cells: B cells 
and T cells. Both B and T cells are covered with antibodies. Antibodies are proteins 
capable to chemically bind to nonself antigens. Antigens are proteins that cover the 
surface of self and nonself cells. Whether chemical binding will happen between an 
antibody and an antigen depends on the complementarity of their three-dimensional 
chemical structures. If it does, the antigen and the antibody are said to be “cognate". 
Because this complementarity does not have to be exact, an antibody may have several 
different cognate antigens. What happens after binding depends on additional control 
signals exchanged between different IS cells, as we explain next. 

One B cell is covered by only one type of antibody, but two B cells may have very 
different antibodies. As there are many B cells (about 1 billion fresh cells are created 
daily by a healthy human), there is also a large number of different antibodies at the 
same time. How is this diversity of antibodies created and why do antibodies not match 
self antigens ? The answer is in the process of creating B cells. B cells are created from 
stem cells in the bone marrow by rearrangement of genes in immature B cells. Stem cells 
are generic cells from which all immune cells derive. Rearrangement of genes provides 
diversity of B cells. Before leaving bone marrow, B cells have to survive negative 
selection: if the antibodies of a B cell match any self antigen present in the bone marrow 
during this phase, the cell dies. The cells that survive are likely to be self tolerant. 

B cells are not fully self tolerant, because not all self antigens are presented in bone 
marrow. Self tolerance is provided by T cells that are created in the same way as B cells, 
but in the thymus, the organ behind the breastbone. T cells are self tolerant because almost 
all self antigens are presented to these cells during negative selection in the thymus. 

After some antibodies of a B cell match antigens of a pathogen or self cell (we call this 
event “signal lb") that antigens are processed and presented on the surface of the B cell. 
For this, Major-Histocompatibility-Complex (MHC) molecules are used and their only 
function. If antibodies of some T cell bind to these antigens and if the T cell is activated 
(by some additional control signal) the detection is verified and a confirmation signal 
sent from T cell to B cell (we call this event “signal 2b"). Signal 2b starts the process 
of producing new B cells, that will be able to match the pathogen better. This process 
is called clonal selection. If signal 2b is absent, it means that the detected antigens are 
probably self antigens for which the T cells are tolerant. In this last case, the B cell will 
die together with its self reactive antibodies. 

B cells can begin clonal selection without confirmation by signal 2b, but only in the 
case when matching between B cell antibodies and antigens is very strong. This occurs 
with a high probability only for memory B cells, the cells that were verified in the past 
to match nonself antigens. 
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In the clonal selection phase, a B cell divides into a number of clones with similar 
but not strictly identical antibodies. Statistically, some clones will match the pathogen 
that triggered the clonal selection better than the original B cells and some will match 
it worse. If the pathogens whose antigens triggered clonal selection are still present, 
they will continue to trigger new B cells that match pathogen well to clone. The process 
continues reproducing B cells more and more specific to present pathogens. B cells 
that are specific enough become memory B cells and do not need costimulation by 
signal 2b. This is a process with positive feedback and it produces a large number of 
B cells specific to the presented pathogen. Additionally, B cells secrete chemicals that 
neutralize pathogens. The process stops when pathogens are cleared from the body. 
Debris produced by the process are cleared by the innate immune system. Memory B 
cells live long and they are ready to react promptly to the same cognate pathogen in the 
future. Whereas first time encountered pathogens require a few weeks to be learned and 
cleared by the IS, the secondary reaction by memory B cells takes usually only a few 
days. 

The Danger Signal is an additional control used for activating T cells. After T cell 
antibodies bind to antigens presented by MHC of a B cell (signal It), the T cell is 
activated and sends signal 2b to a B cell only if it receives a confirmation signal (signal 
2t) from an Antigen Presenting Cell (APC). The APC will give signal 2t to a T cell only 
if it engulfed the same nonself antigens, which happens only when the APC receive a 
danger signal from self cells or from the innate immune system. The danger signal is 
generated when there is some damage to self cells, which is usually due to pathogens. 
As an example, the danger signal is generated when a cell dies before being old; the cell 
debris are different when a cell dies out of old age or when it is killed by a pathogen. 

There are many other subtle mechanisms in the IS, and not all of them are fully 
understood. In particular, time constants of the regulation system (lifetime of B and T 
cells, probability of reproduction) seem to play an important role in the performance of 
the IS [14]. We expect that we have to tune similar parameters carefully in an AIS. 

3 Design of Our Detection System 

3.1 Mapping of Natural IS Elements to Our Detection System 

The elements of the natural IS used in our detection system are mapped as follows. 

Body: the entire mobile ad-hoc network 
Self-Cells: well behaving nodes 
Non-Self Cells: misbehaving nodes 

Antigen: Sequence of observed DSR protocol events recognized in sequence of packet 
headers. Examples of events are “data packet sent", “data packet received", “data 
packet received followed by data packet sent", “route request packet received fol- 
lowed by route reply sent". The sequence is mapped to a compact representation as 
explained in Section 3.2. 

Antibody: A pattern with the same format as the compact representation of antigen 
(Section 3.2). 

Chemical Binding: binding of antibody to antigen is mapped to a “matching function", 
as explained in Section 3.2. 
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Negative Selection and Bone Marrow: Antibodies are created during an offline learn- 
ing phase. The bone marrow (protected environment) is mapped to a network with 
only certified nodes. In a deployed system this would be a testbed with nodes de- 
ployed by an operator; in our simulation environment, this is a preliminary simulation 
phase. 



3.2 Antigen, Antibody, and Matching Function 

Antigens could be represented as traces of observed protocol events. Protocol events 
and their timings constitute the behavior of a node, and our goal is to detect this behavior 
if it is wrong. However, even in low bit rate networks, this rapidly generates sequences 
that are very long (for a 100 seconds observation interval, a single sequence may be up 
to 1 Gbit long), thus making generation of a large number of patterns prohibitive. This 
was recognized and analyzed by Kim and Bentley in [8] and we follow the conclusions, 
which we adapt to our case, as we describe now. 

A node in our system monitors its neighbors and collects one protocol trace per 
monitored neighbor. A protocol trace consists of a series of data sets, collected on non- 
overlapping intervals during activity time of a monitored neighbor. One data set consists 
of protocol events recorded during one time interval of duration At (At = 10s by 
default), with an additional constraint to maximum Ng events per a data set (Ng = 40 
by default). 

Data sets are then transformed as follows. First, protocol events are mapped to a 
finite set of primitives, identified with labels. In the simulation, we use the following list. 

A= RREQ sent E= RREQ received 

B= RREP sent F= RREP received 

C= RERR sent G= RERR received 

D= DATA sent and IP source address H= DATA received and IP destination ad- 
is not of monitored node dress is not of the monitored node 

A data set is then represented as a sequence of labels from the alphabet defined 
above, for example 

1 1 = (EAFBHHEDEBHDHDHHDHD, . .) 

Second, a number of “genes" are defined. A gene is an atomic pattern used for matching. 
We use the following list. 

Gene 1 = #E in sequence Gene 3 = #H in sequence 

Gene 2 = #(E*(A or B)) in sequence Gene 4 = #(H*D) in sequence 



where #(’sub-pattern’) is the number of the sub-patterns ’sub-pattern’ contained in a 
sequence such is li, with * representing one arbitrarily label or no label at all. For 
example, #(E*(A or B)) is the number of sub-patterns that are two or three labels long, 
and that begin with E and end with A or B. The genes are used to map a sequence such 
as (i to an intermediate representation that gives the values of the different genes in one 
data set. For example, li is mapped to an antigen that consists of the four genes: 

h= (3 2 7 6) 
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The genes are defined with the intention to translate raw protocol sequences into more 
meaningful descriptions. Concretely, we define them in such a way to capture correlation 
between protocol events. For a normal DSR operation (Section 2.1), the values of genes 
1 and 2 are correlated, as well as the values of genes 3 and 4. In the case of misbehavior, 
this correlation will change. This is a manual way to define the genes, and we discus 
alternatives in Section 5. 

Finally, a gene value is encoded on Ng bits (Ng = 10 by default) as follows. A 
range of values of a gene, that are bellow some threshold value, is uniformly divided 
on Ng intervals. The position of the interval to whom the gene value belongs gives the 
position of the bit that is set to 1 for the gene in the final representation. The threshold is 
expected to be reached or exceeded rarely. The values above the threshold are encoded 
as if they belong to the last interval. Other bits are set to 0. For example, if Ng=l0 and 
if the threshold value for all the four defined genes is equal to 20, I 2 is mapped to the 
final antigen format: 

^3 = (0000000010 0000000010 0000001000 0000001000) 

There is one antigen such as every At seconds, for every monitored node, during activ- 
ity time of the monitored node. Every bit in this representation is called a “nucleotide". 

Antibody and Matching Function. Antibodies have the same format as antigens (such as 
(3), except that they may have any number of nucleotides equal to 1 (whereas an antigen 
has exactly one per gene). An antibody matches an antigen (i.e. they are cognate) if the 
antibody has a 1 in every position where the antigen has a 1 . This is the same as in [9] 
and is advocated there as a method that allows a detection system to have good coverage 
of a large set of possible nonself antigens with a relatively small number of antibodies. 

Antibodies are created randomly, uniformly over the set of possible antibodies. Dur- 
ing negative selection, antibodies that match any self antigen are discarded. 



3.3 Node Detection and Classification 



Matching an antigen is not enough to decide that the monitored node misbehaves, since 
we expect, as in any AIS, that false positives occur. We say that a monitored node is 
detected (or “suspicious") in one data set (i.e. in one interval of duration At) if the 
corresponding antigen is matching any antibody. So, detection is done per interval of 
duration At. A monitored node is classified as “misbehaving" if the probability that 
the node is suspicious, estimated over a sufficiently large number of data sets, is above 
a threshold. The threshold is computed as follows. 

Assume we have processed n data sets for this monitored node. Let be the number 

of data sets (among n) for which this node is detected (i.e. is suspicious). Let 0max be a 
bound on the probability of false positive detection (detection of well behaving nodes, 
as if they are misbehaving) that we are willing to accept, i.e. we consider that a node 
that is detected with a probability < 0max is a correct node (we take by default we take 
6*max = 0.08). Let a (=0.01 by default) be the classification error that we are willing to 
accept. We classify the monitored node as misbehaving if 



3 ^ 

n 



^ ^max(l “t” 



$(«) / 1 - ^max 

V ^max 
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where^(o;) is the l-a-quantileof the normal distribution (for example, ^(0.01) = 2.33). 
As long as Equation (1) is not true, the node is classified as well behaving. With default 
parameter values, the condition is ^ > 0.08 + 0.27 . The derivation of Equation ( 1 ) 

is given in Appendix 1 . 

The detailed detection and classification algorithm that is executed in every node (in 
addition to the standard DSR code) is given in Appendix 2. 

3.4 Detection System Parameters 

In this implementation, the default values of the parameters (Table 1) are chosen from 
extensive pilot simulation runs, as a compromise between good detection results and a 
small memory and computation usage of the detection system. 



Table 1. Detection System Parameters 



Parameters 


Default values 


maximal number of self antigens collected for learning 


80 


maximal time for collecting self antigens for learning 


200 s 


maximal time for collecting a data set (an antigen) 


10 s 


max. number of protocol events recorded in a data set 


40 


number of detectors (i.e. antibodies) 


300 


number of genes in an antigen 


4 


number of nucleotides per gene 


10 


max. accepted misdetection probability 6^max 


0.08 


targeted classification error ratio a 


0.01 



4 Simulation Results 

4.1 Description of Simulation 

We have implemented this first pass of our system in Glomosim [15], a simulated envi- 
ronment for mobile ad-hoc networks. We simulate a network of 40 nodes. The simulation 
area is a rectangle with the dimensions of 800 m x 600 m. The nodes are initially de- 
ployed on a grid with the distance between neighbors equal to 100 m. Mobility model 
is random way-point with fixed speed of 2 m/s. The radio range is 450 m. Traffic is 
generated as constant bit-rate, with packets of length 512 bytes sent every 0.2-1 s. We 
inject one misbehaving node. 

Eor all simulations, the parameters have the default value, except for the number of 
self antigens collected for learning, the number of antibodies, and misbehavior proba- 
bility, which are taken as parameters. 

Misbehavior is implemented by modifying the DSR implementation in Glomosim. We 
implemented two misbehavior: (1) non-forwarding route requests and non-answering 
from its route cache and (2) non-forwarding data packets. The misbehaving node does 
not always misbehave. In contrast, it does so with a fixed probability for both types of 
misbehavior, which is also a simulation parameter (default value is 0.6). 
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Performance Metrics: We show simulation results with the following metrics: 

True Positive Detection: percentage of successfully detected nonself antigens 
False Positive Detection: percentage of mistakenly detected self antigens 
Correct Classification Time: time until the misbehaving node is classified as such 
(after a sufficiently large number of positive detections occurred, see Equation (1)) 
False Classification: the percentage of well behaving nodes that are mistakenly clas- 
sified as misbehaving. This is much less than false positive detection since classifi- 
cation uses evidence from several antigens. 

We performed 10 independent replications of all experiments, in order to obtain confi- 
dence intervals. All graphs are with 90% confidence intervals. 



4.2 Simulation Results 

Classification capabilities: For all simulation runs, the misbehaving node is detected 
and classified as misbehaving by all other nodes. The main effect on other classification 
metrics is by the parameter a, the targeted classification error ratio (Figure 2). By de- 
creasing the value of a, the false positive classification ratio may be decreased to very 
small values (Figure 2(a)), but there is some increase of the time needed for true positive 
classification (Figure 2(b)). Some value of a may be chosen as a compromise between 
false classification ratio and time until correct classification. For the range of a from 




target false classification 




target false classification 



(a) 



(b) 



Fig. 2. The AIS main metrics: (a) False classification ratio and (b) time until correct classification 
of misbehaving nodes versus target false positive classification probability parameter a. Comment: 
true positive classification ratio is equal to 1, what is not shown on the graphs. 



0.05 to 0.03, the values of the false classification ratio are below targeted values, but 
for the smaller values of a this is not the case (Figure 2(a)). A reason for this may be 
correlation in detection results in consecutive detection intervals. 

Impact of misbehavior and parameters tuning: Effects of some parameters other than 
a and effect of the misbehavior probability on detection are shown on Figure 3. 
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misbehavior probability 




number of self antigens 



(a) 



(b) 
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number of antibodies 
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Fig. 3. Impact of misbehavior and parameters tuning: Probability of correct detection of misbe- 
having nodes (true positive) and misdetection of well behaving nodes (false positive) from a data 
set collected during an interval At = 10s versus (a) misbehavior probability for the misbehaving 
node, (b) number of self antigens collected for learning and (c) number of antibodies. 



Figure 3(a) shows that for a very small probability of misbehavior, distinguishing 
between good and misbehaving nodes is difficult, but in this case, the impact of mis- 
behavior on the network is also small. If a node misbehaves with the probability equal 
to or greater than 10 %, it is well distinguished from well behaving nodes, in the sense 
that it is detected in a percentage that is significantly different than that of well behaving 
nodes. Even if the percentage of detection is not very high (between 25% and 60 %), this 
distinguishing allows good classification results (Figure 2). For a very high probability 
of misbehavior, the percentage of true positive detection is slightly decreased, because 
the neighbors of the misbehaving node can collect less data about its behavior, as it does 
not send packets except its own. 

Figure 3(b) shows the effect of the number of self antigens collected for learning. 
If the number of self antigens collected for learning is too small, both self and nonself 
antigens are mainly detected by antibodies. Antibodies are mainly random and have good 
coverage of both self and nonself sets of antigens. False positive detection decreases very 
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fast with increasing the number of self antigens used for learning, and it reaches saturation 
at about 40 self antigens. As an unwanted effect, true positive detection is decreased to 
the value of about 60 %, because some nonself antigens that are similar to self antigens 
will not be detected in some cases. 

Figure 3(c) shows the impact of the number of antibodies. Increasing this number 
increases coverage, which increases true positive detection. False positive detection 
remains small because antibodies are mainly self tolerant. After the saturation of curves 
is reached, further increasing of the number of antibodies only increases processing cost 
and does not improve detection. So, as compromise, some optimal value can be chosen. 



5 Conclusions 

We have designed a method to apply an artificial immune system approach for misbe- 
havior detection in a mobile ad-hoc network. Our simulations show good detection and 
classification capabilities but, in this early phase, it is premature to draw conclusions 
about the performance of the AIS approach. We need more experiments to extend our 
initial work to more misbehavior and traffic patterns. Instead, we would like to focus 
now on what the experience of designing this first phase tells us for the future. 

Mapping IS to AIS. The most difficult problem we encountered was the mapping from 
the IS to the concrete problem of DSR misbehavior detection. We have followed the 
approaches in [5,7] but a large number of fundamental issues remain unclear. At the 
highest level, we still wonder what is the best choice for a target unit to be detected: 
the node, or sequence of messages, or a message itself. This choice could have a large 
impact on the design challenges and possible use of the detection system. Even if we 
stay with the mapping we have designed here, things remain vastly open. 

The very definition of genes is one of them. We have defined them in an ad-hoc 
way, trying to guess the definitions that would have the best detection capabilities. We 
made sure to have at least two correlated genes per misbehavior, in order to capture 
it efficiently. Indeed, we propose to use the correlation between genes as an important 
criterion in selecting the genes defined in the antigen structure. In a next phase, we are 
planning to automate the process of selecting genes. Correlation between genes from 
an offered set of genes can be computed from experimental data in a normal operation 
of the network without misbehaving nodes. Good pairs, triples or fc-tuples of genes, 
which score high cross-correlation, can be selected automatically. The final selection 
of candidate genes would still require to be screened by a human expert intervention, 
but this would be considerably simpler than designing genes from scratch, as we did. 
One can observe that such a gene selection process is not part of the natural IS, but 
one can view it as an accelerated selection process. An alternative is to define genes as 
arbitrary low level bit patterns, and let negative and clonal selection do the job of keeping 
only the relevant antibodies. This would be more in line with the original motivation for 
using an AIS. A problem with this approach is that it would require many genes, and 
the processing effort needed to generate good detectors increases exponentially with the 
number of genes. A possible angle of attack is based on the observation that the IS is 
also a resource management system. Indeed, the IS has mechanisms to multiply IS 
cells and send them to the parts of the body under attack, thus mobilizing resources 
where and when needed. The analog here would be to use randomization: in a steady 
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state, only a small, random subset of protocol event sequences is used to create antigen. 
When an attack is on (signalled by a danger signal, see below), more events are analyzed 
in the regions that are in danger. 

Innate System and Danger Signal: The model we implemented in this first step is not 
able to learn about a changing self, which is not a problem if it is used for a constant 
DSR protocol and in the case of mobility and traffic patterns similar to those presented 
during the learning phase. In order to be able to adapt to mobility model and traffic 
pattern changes, and eventually changes to DSR, the model must be fortified with the 
additional mechanisms of the danger signals [12]. Danger signals could he defined as 
network performance indicators (packet loss, delay). In the natural IS, the danger signal 
is intimately linked to the innate system. Here, the innate system could be mapped to 
the traditional approach, i.e. a set of pre-defined detection mechanisms as we developed 
in [4]. It is likely that many new attacks are accompanied with symptoms that are not 
new. Thus the innate system could be used as a source of the danger signal as well. This 
would free resources to focus the adaptive immune system on the detection of truly new, 
unexpected misbehavior. 

Clonal Selection and Memory Effects are not implemented in this step, hut it will he 
straightforward to do so. The expected benefit is a better reaction to a misbehavior that 
repeats itself. 

Regulation: Translation of non-self antigen matching to misbehaviour detection is done 
in a classical, statistical way. This should be compared to the regulation of B-cell and 
T-cell clonal division [14], which is algorithmically very different. 

Parameter Tuning: Even if an appropriate mapping of IS to AIS is found, it remains 
that the performance is very sensitive to some parameters; the parameters have to he 
carefully tuned. It is unclear today whether this dependency exists in the IS, and if natural 
selection takes care of choosing good values, or whether there are inherently stable 
control mechanisms in the IS that make accurate tuning less important. Understanding 
this is key to designing not only an AIS as here, but also a large class of controlled 
systems. 
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Appendix 1: Derivation of Equation (1) 

We model the outcome of the behavior of a node as a random generator, such that with 
unknown but fixed probability 9 a data set is interpreted as suspicious. We assume the 
outcome of this fictitious generator is iid. We use a classical hypothesis framework. The 
null hypothesis is 0 < 0max, i-e., the node behaves well. The maximum likelihood ratio 
test has a rejection region of the form {M„ > K{n)} for some function K{n). The 
function K (n) is found by the type-I error probability condition: P{M„ > K{n)}\9) < 
a, for all 9 < 6*max> thus the best K{n) is obtained by solving the equation 

P({M„ >iT(n)}|0n,ax) =a 

The distribution of M„ is binomial, which is well approximated by a normal distribution 
with mean p = n9 and variance ri9{l — 9). After some algebra this gives K{n) = 
V^C'\/^max(l — 0niax) + fi-6*max, from which Equation (1) derives immediately. 
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Appendix 2: Detailed Detection and Classification Algorithm 

phase=LEARNING; 
switch (phase) ■( 
case LEARNING-C 

phaseTimer=maximalLearningTime ; 

numberOf CollectedDataSets=0 ; collectingDataSetsTimer=0 ; 
SetOfDetectors=CreateAnEmptySet Of Detectors 0 ; numberOf Detect or s=0 ; 
while (phase==LEARNING) { 

if (aPacketSentReceivedOrOverheared) { 
createOrUpdateNeighborsList () ; 
if (collectingDataSetsTimer==0 && 

numberOf CollectedDataSets<maxNumOfCSD) { 
openNewCollectingDataSetsO ; collectingDataSetsTimer==deltaT; 

}//end if 

UpdateCurrentDataSetsIfTheyAreOpenO ; 

}//end if 

if (collectingDataSetsTimer==0)-C 
closeCurrentCollectingDataSets () ; 

}//end if 

if (phase !=LEARNING) break; 
if (phaseTimer==0) { 

//create detectors by negative selection 

setOfSelf Antigens=createSelf AntigensFromCollectedDataSets 0 ; 
while (numberOf Detectors<TargetedNumDet) { 
generateANewDetectorByRandomO ; 
if (isNewDetMatchingAnySelf AntigenO ) { 
deleteNewDetector 0 ; 

}//end if 
else{ 

addNewDetectorToSetOfDetectorsO ; number Of Detect or s++ ; 

}//end if else 
}//end while 

phase=DETECTINGandCLASSIFYING ; 

}//end if 
}// end while 
break; 

}//end case 

case DETECTINGandCLASSIFYING{ 

while (phase==DETECTINGandCLASSIFYING) { 
if (isPacketSentReceivedOrOverhearedO ) { 

CreateOrUpdateNeighborsList 0 ; 
if (collectingDataSetsTimer==0 && 

numberOfCollectedDataSets<maxNumOfCSD) { 

openNewCollectingDataSetsO ; collectingDataSetsTimer==deltaT; 

}//end if 

UpdateCurrentDataSetsIfTheyAreOpenO ; 

}//end if 

if (phase ! =DETECTINGandCLASSIFYING) break ; 
if (collectingDataSetsTimer==0)-C 
//do detection eind classification 
createAntigensFromCurrentDataSets 0 ; 
deleteCurrentDataSets 0 ; 

checkAntigensFromLastDeltaTByDetectors 0 ; 
updateListOfDetectedNodesO ; 
updateDetectionResultsForObservedNodes 0 ; 
deleteAntigensFromLastDeltaTO ; 
checkDetectionResultsForClassif icationO ; 

}//end if 
}//end while 
break; 

}//end case 
}//end switch 
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Abstract. In wireless sensor networks, hundreds or thousands of mi- 
crosensors are deployed in an uncontrolled way to monitor and gather 
information of environments. Sensor nodes have limited power, computa- 
tional capacities, memory, and communication capability. In this paper, 
we propose a novel scheme for data gathering where sensor information 
periodically propagates without any centralized control from the edge of 
a sensor network to a base station as the propagation forms a concen- 
tric circle. By observing the radio signals emitted by sensor nodes in its 
vicinity, a sensor node independently determines the cycle and the tim- 
ing at which it emits sensor information in synchrony. For this purpose, 
we adopt a pulse-coupled oscillator model based on biological mutual 
synchronization such as that used by flashing fireflies, chirping crick- 
ets, and pacemaker cells. Through simulation experiments, we confirmed 
that our scheme can gather sensor information in a fully-distributed, 
self-organizing, robust, adaptable, scalable, and energy-efficient manner. 



1 Introduction 

With the development of low-cost microsensor equipment having the capabil- 
ity of wireless communications, sensor network technology [1] has attracted the 
attention of many researchers and developers. A sensor node is equipped with 
one or more sensors with analog/digital converters, a general purpose proces- 
sor with a limited computational capacity, a small amount of memory, low-cost 
radio transceiver, and a battery power supply. By deploying a large number 
of multi-functional sensors in a monitored region and composing a sensor net- 
work of them, one can remotely obtain information on behavior, condition, and 
position of elements in the region. Sensor nodes monitor the circumstances and 
periodically or occasionally report sensed phenomena directly or indirectly to the 
base station, i.e., the sink of sensor information, using wireless communication 
channels. Sensor networks can be used in agricultural, health, environmental, 
and other industrial applications. More specifically, Intelligent Transportation 
Systems (ITS) and pervasive computing are typical examples that benefit from 
information gathered from circumstances and environments. A sensor node sends 
its sensor information by radio signals, continuously, periodically, or only when 
it detects an event such as a movement of the object. Since a sensor node usually 
has an unidirectional antenna, broadcasting is the major means of data emission. 



A.J. Ijspeert et al. (Eds.): BioADIT 2004, LNCS 3141, pp. 412—427, 2004. 
@ Springer- Verlag Berlin Heidelberg 2004 




Scalable and Robust Scheme for Data Gathering in Sensor Networks 



413 



Sensor nodes are distributed in a region in an uncontrolled and unorganized way 
to decrease the installation cost and eliminate the need for careful planning. 
Thus, the method used to gather sensor information should be scalable to the 
number of sensor nodes, robust to the failure and disruption of sensor nodes, 
adaptable to addition, removal, and movement of sensor nodes, inexpensive in 
power consumption, and fully distributed and self-organizing without a central- 
ized control mechanism. Several research works have been done in developing 
schemes for data gathering in sensor networks, such as [2,3,4]. However, they 
require so-called global information such as the number of sensor nodes in the 
whole region, the optimal number of clusters, the locations of all sensor nodes, 
and the residual energy of all sensor nodes. Consequently, they needs an addi- 
tional, and possibly expensive and unscalable, communication protocol to collect 
and share the global information. Thus, it is difficult to adapt to the dynamic 
addition, removal, and movement of sensor nodes. 

In this paper, we propose a novel and efficient scheme for gathering data 
in sensor networks where a large number of sensor nodes are deployed; in such 
networks, nodes are randomly introduced, occasionally die or get removed, and 
sometimes change their locations. We consider an application that periodically 
collects sensor information from distributed sensor nodes to a base station. Sen- 
sor information is propagated from the edge of a sensor network to the base 
station. We do not assume that all sensor nodes are visible to each other as in 
other research work. An administrator does not need to configure sensor nodes 
before deployment. Our scheme does not rely on any specific routing protocol, 
and it can be used on any medium access (MAC) protocol. 

In periodic data gathering, power consumption can be effectively saved by 
reducing the amount of data to send, avoiding unnecessary data emission, and 
turning off unused components of a sensor node between data emissions. As 
an example, such data gathering can be attained by the following strategy on 
a sensor network where sensor nodes organize a tree whose root is the base 
station in a distributed manner. First, leaves, i.e., sensor nodes that are the most 
distant from the base station, simultaneously emit their sensor information to 
their parent nodes at a regular interval. The parent nodes, which are closer to 
the base station, receive information from their children. They aggregate the 
received information with local sensor information to reduce the amount of data 
to send. Then, they emit it at a timing that is synchronized with the other sensor 
nodes at the same level in the tree. Likewise, sensor information is propagated 
and aggregated to the base station. As a result, we observe a concentric circular 
wave of information propagation centered at the base station. 

To accomplish the synchronized data gathering without any centralized con- 
trols, each sensor node should independently determine the cycle and the timing 
at which it emits a message to advertise its sensor information based on locally 
available information. The ideal synchronization can be attained by configuring 
sensor nodes prior to the deployment, provided that the clocks of sensor nodes 
are completely synchronized, sensor nodes are placed at the appropriate loca- 
tions, and they maintain their clocks through their lifetime. However, we cannot 
realistically expect such an ideal condition. 
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Self-organized and fully-distributed synchronization can be found in nature. 
For example, fireflies flash independently, at their own interval, when they are 
apart from each other. However, when a firefly meets a group, it adjusts an 
internal timer to flash at the same rate as its neighbors by being stimulated 
by their flashes. Consequently, fireflies in a group flash in synchrony. Mutual 
synchronization in a biological system is modeled as pulse-coupled oscillators [5, 
6,7]. Each oscillator Oi has a state Xi, which is determined by a monotonically 
increasing function fi : [0, 1] — >■ [0, 1] of a phase </>,. The phase cyclically shifts as 
time passes. When the state reaches one, an oscillator fires a pulse and goes back 
to the initial state Xi = 0. The pulse stimulates other oscillators within a range 
of pulse propagation and raises their state Xj by some amount of ei{4>j) [7]. 
Those oscillators whose states are raised to one also Are at this time. In this 
way, they reach synchronization. See section 2 for further discussion. In [7], 
the authors applied the pulse-coupled oscillator model to clustering algorithms. 
They defined the stimulus function as a monotonically increasing function 

of the similarity between two objects. If they resemble each other, the stimulus 
has a positive value. They are synchronized and constitute a cluster. On the 
other hand, if they are not similar, they give inhibitory stimulus to each other. 
As a result, they behave non-synchronously and group themselves into different 
clusters. In [8], the authors proposed a management policy distribution protocol 
based on firefly synchronization theory. The protocol is based on gossip protocols 
to achieve weak consistency of information among nodes. The rate of updates is 
synchronized in a network through pulse-coupled interactions. They verified that 
their protocol is scalable to the number of nodes in terms of the average update 
latency. Although they attempted to distribute a management policy whereas 
our application is designed to collect sensor information to a base station, by 
adapting the pulse-coupled oscillator model, we can obtain a fully distributed, 
self-organizing, robust, adaptable, scalable, and energy-efficient scheme for data 
gathering in wireless sensor networks. By observing the signals that neighboring 
sensor nodes emit, each sensor node independently determines the cycle and 
the timing at which it emits a message to achieve synchronization with those 
neighboring sensors and thus draw a concentric circle. 

The rest of the paper is organized as follows. First, in Section 2, we briefly 
introduce sensor networks and the outline of our data gathering scheme. Next, 
we apply a pulse-coupled oscillator model to our data gathering in Section 3. 
Then, we show some experimental results in Section 4. Regarding the results. 
Section 5 discusses some additional considerations of our scheme. Finally, we 
conclude the paper in Section 6. 



2 Sensor Networks and Proposed Data Fusion Scheme 

To collect sensor information for use by users, applications, or systems, a base 
station is placed at a location from which one can draw information of the region. 
Thus, sensor nodes must organize a network over which information sensed at 
all sensor nodes in the region can be successfully gathered to the base station. 
Since sensor nodes are usually deployed in an uncontrolled manner, they are 
prone to failure, they often move, and they die due to battery depletion. Thus, 
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a scalable, robust, adaptable, low-power-consuming, fully distributed scheme is 
needed to organize a sensor network and collect sensor information. 

The sensor networks that our scheme assumes have the following characteris- 
tics. Components of a sensor network are hundreds or thousands of sensor nodes 
and a base station. The base station is placed at a preferable location within 
the range of a radio signal from one or more sensor nodes. Sensor nodes are 
deployed in an uncontrolled way. Sensor nodes are dynamically introduced to 
monitor the region more densely or to replace dead sensor nodes. A sensor node 
stops operating when its battery is depleted. A sensor node might be moved 
to another place. A sensor node monitors its surroundings and obtains sensor 
information. A sensor node can hear radio signals from other nodes. A sensor 
node aggregates its local sensor information and the information received from 
other sensor nodes [2,3,4]. A sensor node has a timer. Its phase shifts as time 
passes, but the timer can be adjusted to an arbitrary point. When a timer ex- 
pires, a sensor node emits its sensor information, possibly aggregated with that 
of other nodes, without waiting for the reception of sensor information from 
other sensor nodes. Information emitted by a sensor node can be received by 
other sensor nodes within the range of a radio signal. We do not assume any 
specific MAC protocol. We can adapt CSMA/CA, FDMA, and CDMA, but we 
prefer CSMA/CA in this paper for its simplicity. Our protocol does not rely on 
any specific routing protocol. We do not assume any specific routing protocol 
but apply the most suitable, whether it be a flat or hierarchical, multi-hop, tree- 
or star-based routing protocol. The routing protocol determines a single sensor 
node or a set of sensor nodes that a sensor node can communicate with. 

3 Pulse-Coupled Oscillator and Its Application to 
Scalable and Robust Data Fusion 

In this section, we first introduce the pulse-coupled oscillator model and then give 
a detailed description of our proposed data gathering scheme. By synchronizing 
the message emissions of sensor nodes, sensor information propagates from the 
edge of a sensor network toward the base station at a preferred frequency. 



3.1 Pulse-Coupled Oscillator 

The pulse-coupled oscillator model is developed to explain the synchronous be- 
haviors of biological oscillators such as pacemaker cells, fireflies, crickets [5]. In 
this section, mainly following the model proposed in [7], we give a brief expla- 
nation of the pulse-coupled oscillator model. Consider a set of N oscillators, 
O = {Oi, • • • , On}. Each oscillator has phases G [0, 1] and Xi G [0, 1], whose 
relation is expressed by a phase-state function fi: 

Xi = ( 1 ) 



fi : [0, 1] — >■ [0, 1] is smooth and monotonically increasing. /i(0) = 0 and /i(l) = 1 
hold. 
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Fig. 1. An example of a sensor network 



As time passes, 4>i shifts toward one and, after reaching it, jumps back to 
zero. The periodic cycle is given as Tj, where ^ When Xi reaches one, 

the oscillator fires and Xi is initialized to zero. Oscillators coupled with the firing 
oscillator are stimulated. Their state Xj is raised by an amount 

= B{xj{t) + £i((/)j)). (2) 



Function B is defined as. 



B{x) 



x, if 0 < a; < 1 

0, if a; < 0 

1, if a; > 1 



(3) 



Their phase 4>j is given as (f)j = f0{xj). When xj reaches one, oscillator j also 
fires. Oscillators i and j are then regarded as synchronized. 

It has been proven that oscillators with different phase-state functions and 
different frequencies attain synchronization from arbitrary initial conditions. 



3.2 Scalable and Robust Data Fusion 

First, we give a brief explanation of the basic behavior of data gathering in 
our scheme. Consider the network of one base station and six sensor nodes as 
illustrated in Fig. 1. Dashed circles stand for the ranges of radio signals. 

We define the level of each sensor node as the number of hops from the base 
station. Two sensor nodes that can receive a radio signal of the base station are 
regarded on level 1 (open circle). Four sensor nodes that can directly communi- 
cate with nodes on level 1 are on level 2 (filled circle). Information propagates 
from sensor nodes on the highest level to the base station. When we consider 
periodic data gathering, it is efficient in terms of power consumption that sensor 
nodes on the same level synchronously inform their parents of their sensor infor- 
mation. In addition, since each sensor node emits its sensor information at its 
own timing without waiting for the reception of sensor information from other 
nodes, the nodes must emit their information at a time slightly before their par- 
ents emit information. For example, if the base station needs information about 
the region at time t, sensor nodes on level 1 simultaneously emit their informa- 
tion at t — S. For them to reflect information gathered on the higher level, all four 
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sensor nodes on level 2 should emit their information &t t — 25 in synchrony with 
each other. Consequently, if there are level 3 nodes, they emit their information 
at t — 36. If such synchronized data gathering is attained, the radio component of 
a sensor node needs to be turned on only for 6 out of the data gathering interval 
in this example. Since emission is synchronized among sensor nodes on the same 
level, 6 should be appropriately chosen so that all sensor nodes on the same 
level can successfully emit their information despite the existence of collisions 
on the medium access layer. Sensor nodes belonging to the same level have to 
be synchronized, even if they are geographically apart. In the above example, 
synchronization is needed for two sets of sensor nodes, i.e., two open-circle nodes 
and four filled-circle nodes. In addition, a set of synchronized sensor nodes has 
to synchronize with another set that is closer to the base station but with a gap 
of 5. 

To attain such inter- and intra-level synchronizations, we adapt the pulse- 
coupled oscillator model explained in Section 3.1. The base station emits a bea- 
con signal at a regular interval to make sensor nodes within the range of its 
radio signal synchronize with each other. We denote a set of N sensor nodes as 
S = {Si, - ■ ■ ,Sn}. Sensor node Si belongs to level k. Initially, level k is set to 
infinity or a reasonably large value. It has a timer and a state Xi. A state is given 
by a monotonically increasing function fi : [0,1] — >■ [0, 1] of a phase (j)i of the 
timer. For example, we used the following fi as an example in the simulation 
experiments. 

Vi, f^{4>i) = ^\n[l + - l)4>i\, (4) 

This formula is taken from [5,7]. 6 > 0 is one of parameters that dominate the 
rate of synchronization [5]. As the dissipation b increases, fi raises more rapidly 
and, as a result, synchrony emerges more rapidly. To take into account the offset 
5i, we consider a regulated phase (/)', which is given by the following equation. 






if (/'i + < 1 

4>i + 6i — 1, otherwise 



( 5 ) 



From we obtain a regulated state x'i by fi{(pi). Sensor node Si emits a message 
when its regulated state a;' becomes one. Thus, it fires Si earlier than state Xi 
reaches one. 

At time t, sensor node Sj receives a message from sensor node Si, which is 
specified as Sj's next node to the base station by a routing protocol or whose 
level li is smaller than Sj’s level Ij. It is stimulated and its state changes as 

= B{xj{t) + e). (6) 

The regulated state x' of stimulated sensor Sj is given as Xj = 
fj{p{gj{xj{t~^)),ljSj)) where gj = f~^. When sensor Sj's regulated state x' 
becomes one, it also emits a message in synchrony with sensor Si. Since colli- 
sions occur on the medium access layer, sensor node Si ignores messages from t 
to t + Si when it has already been stimulated at t to avoid being stimulated by 
deferred signals, as fireflies do. 
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Fig. 2. Message header format 



A message that a sensor node Si emits to advertise its information contains 
the level h, a stimulus e with which it stimulates sensor nodes around it, 6 
for its children to use an identical offset, and its sensor information possibly 
aggregated with other sensor information. Figure 2 illustrates a message format. 
The number of bits needed for the level identifier is as many as several bits. If 
the number of levels exceeds the bits assigned to the level identifier, we can use 
those bits in a cyclic way. The stimulus e and the offset S take decimal fractions 
between zero and one. The offset is in the order of milliseconds to seconds. If the 
interval of data gathering is one hour, the offset of one second is expressed as 
1 / (60 X 60) = 2.78e — 4. If it is once a day, the offset becomes 1/(24 x 60 x 60) = 
1.16e — 5. The single IEEE standard floating-point representation requires 32 
bits, but we can use a smaller number of bits in our scheme. Of course, it also 
useful to employ an absolute value to express the offset. If we use four bits for 
the level identifier, three bits as the exponent, and nine bits for the fraction, a 
total of twenty-eight bits are needed. References [2,3] used 2000-bit messages and 
[4] used 1000-bit messages. Consequently, our protocol is 1.2% and 2.4% more 
expensive, respectively, than those protocols in power consumption for message 
exchange. 

The level that sensor node Si belongs to is given as the smallest level, say 
Ij, among messages that sensor node Si can receive plus one, i.e., U = Ij -|- 1. A 
beacon signal from the base station advertises the level zero. When a new sensor 
node occasionally receives a message from a faraway sensor node, it first wrongly 
determines its level. As time passes, however, it receives another signal from a 
sensor node that is closer to the base station. At this point, it finally identifies 
its level correctly. We show an example of such a transition of level identification 
later. Since a sensor node ignores a message from a sensor node whose level 
is the same or higher for synchronization, there is no direct interaction among 
sensor nodes on the same level. Therefore, intra-level synchronization is attained 
through inter-level stimulus. 

To summarize, the basic behavior of our sensor network can be explained as 
follows. We first consider the initial stage of deployment where all sensor nodes 
are introduced to a region. The base station begins to emit the beacon signal 
at the regular interval of data gathering. All sensor nodes initialize their levels 
to infinity or a reasonably large value. They also initialize their timers. Each 
sensor node begins to sensor its surroundings and stores sensor information into 
its memory. When the timer expires, it emits a message to advertise its sensor 
information, level, function e, and offset S. When it receives a message from 
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another sensor node, it first compare its level with the level in the message. If 
the former is smaller than the latter by more than two, it ignores the message. 
If the former is smaller than the latter by one, it aggregates received sensor 
information with its locally stored information. Finally, if the former is larger 
than the latter, the sensor node is stimulated. It adjusts its level and raises 
its state x. A stimulated sensor node begins to emit a message that carries 
sensor information stored in its memory when the regulated state x' reaches 
one. If the state x reaches one by being stimulated, those two sensor nodes 
are synchronized at this time. Once synchronization is attained, a sensor node 
switches to a battery-saving mode, which is discussed in Section 5.2. 

Next, we consider the case where a new sensor node is introduced in a sensor 
network in operation. Initially, a new sensor node does not synchronize with 
any other sensor nodes. It monitors its surroundings, emits sensor information, 
and receives messages from sensor nodes in its vicinity, as in the above case. 
Being stimulated several times, its level becomes correctly identified, and its 
timer synchronizes with that of a sensor node whose level is smaller by one. 
When a sensor node disappears due to battery depletion or movement, a sensor 
node that is synchronized with the vanished sensor node will be stimulated by 
another that is audible. If there is no other sensor node with smaller level in its 
vicinity, the sensor node first becomes isolated. Since it does not receive stimuli 
any more, it can recognize the isolation and then it initializes its own level so 
that it can synchronize with other neighboring sensor nodes. 



4 Simulation Results 

In this section, we show some results of simulation experiments to investigate 
the basic behavior and characteristics of the proposed scheme. We employ a 
concentric circular sensor network for an easier understanding in the following 
experiments. We have confirmed that our protocol can successfully achieve de- 
sirable results on any sensor network with an arbitrary distribution of sensor 
nodes. The base station is assumed to be located at the center of the region. 
The range of the radio signal is identical among sensor nodes, and the radius is 
fixed at five units of length. Sensors are randomly placed on circumferences of 
a concentric circle whose center is the base station. The n-th circle has a radius 
of 3n units of length. For example, the second circle has a radius of six units 
of length. Sensors are placed from the innermost circle. When the number of 
sensor nodes on a circumference of the n-th circle reaches lOn, then the sub- 
sequent sensor nodes are placed on the circumference of the (n -I- l)-th circle. 
An example of a simulated network is illustrated in Fig. 3 for 100 sensor nodes. 
Thus, when sensor nodes are numbered from the first node placed, the correct 
level of a sensor node can be calculated from its identifier. This allows easier 
investigation but does not restrict the applicability of our scheme. 

The phase-state functions fi are identical among sensor nodes and defined by 
(4). In the following experiments, we used b = 3.0 [7]. The functions of stimulus 
Ci are identical and defined as 



ViVj, = e. 



(7) 
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Fig. 3. An example of concentric circular sensor networks 



We used e = 0.3 [5]. Finally, offset values Si are also identical and given as 

Vt, Si = S. (8) 

We used <5 = 0.2. This means that sensor nodes on the n-th circle emit their 
messages faster than the beacon by 0.2n units of time. We call this condition 
“the sensor network reaching global synchronization by our scheme.” In the ex- 
periments, we ignore the propagation delay of a radio signal and the collision 
of radio signals on the medium access layer. In an actual situation, S must be 
large enough when there are many sensor nodes to take into account collisions. 
However, since sensor nodes on the different levels have different phases in our 
scheme, the possibility of collision is reduced. We should note here that no rout- 
ing protocol is employed in simulation experiments. With our proposed scheme, 
sensor information propagates to the center of the circle without a help of routing 
protocols. 

4.1 Basic Behavior 

First, we show simulation results for the case where the sensor network has 100 
sensor nodes. Initial states of the sensor nodes take random values from 0.0 to 
1.0 that follow a uniform distribution. A simulation experiment stops when a 
sensor network reaches global synchronization. In this section, we assume that 
timers of sensor nodes have the same timer period. Thus, timers expire at the 
same frequency. When there exist timers with different frequencies, the fastest 
timer would dominate the synchronization as stated in [5]. Thus the frequency 
of data gathering, which is controlled by the interval of beacon signals from the 
base station, should be the smallest in the sensor network. 

Figure 4 illustrates the transitions of levels of sensor nodes si and s2 on the 
first, sll and sl2 on the second, s31 and s32 on the third, and s61 and s62 on the 
fourth circle. Initially, their levels are set to reasonably large values, for example, 
larger than the number of sensor nodes. The initial value does not affect the time 
to reach global synchronization. When a sensor node receives a radio signal from 
a sensor node whose level has already been determined, it can identify its level. 
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Fig. 4. Transition of level 



Fig. 5. Timing of message emission 



In the figure, sensor nodes si and s2, which are on the innermost circle, received 
a beacon signal at time 0.673 and found that their levels were one. Then, sensor 
nodes sll and sl2 received radio signals from sensor nodes on the first circle 
at 0.712 and set their levels to two. Sensor nodes s31 and s32 occasionally first 
received a radio signal from a sensor node on the same circle, i.e., the third 
one. As a result, they wrongly identified their levels as four at 2.11 and 0.990, 
respectively. However, at 2.27, they received a radio signal from a sensor node 
on the second circle and changed their levels to three. In this example, global 
synchronization was accomplished at 5.07. 

Figure 5 shows how the sensor network reaches the global synchronization. 
Dots on lines stand for instants when sensors emit messages. It can be seen that 
each sensor first flashed independently of the others based on its local timer. 
However, as time passed, sensor nodes on the same circle became synchronized 
by being stimulated by radio signals that sensor nodes on the inner circle emitted. 
They began to flash in synchrony with other sensor nodes on the same circle and 
earlier than sensor nodes on the inner circle by the offset, S = 0.2. Finally, global 
synchronization was accomplished at 5.07. Observing the rightmost dots on all 
nine sensor nodes, it can be seen that sensor nodes emit messages in synchrony 
with other sensor nodes on the same circle at exactly 0.2 units of time ahead of 
the data emission by sensor nodes on the inner circle. 



4.2 Dynamic Deployment and Removal of Sensor Nodes 

In the experiments described in the preceding section, all 100 sensor nodes were 
fully deployed at the initial stage. However, in an actual situation, sensor nodes 
are added to a sensor network as well as removed or stopped occasionally. Our 
scheme can reach the desired condition on such dynamic sensor networks. 

The following figures were obtained from simulation experiments where sen- 
sor nodes were deployed in the region at random one by one. In addition, they 
also stopped working at random one by one. The time that a sensor node was 
deployed follows a uniform distribution from 0.0 to 10.0 units of time. The time 
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to stopping a sensor node follows a uniform distribution from 15.0 to 25.0 units 
of time. 

Figure 6 illustrates how newly introduced sensor nodes identify their levels. 
The initial level of a newly deployed sensor node is set to a large value. The level 
is gradually adapted as it encounters another sensor node through reception 
of radio signals, as described in Section 3.2. Since we cannot give a detailed 
explanation of the figure due to space limitation, we focus on sensor node si on 
the innermost circle, whose trajectory is depicted with a solid line. Sensor node 
si was deployed at 9.05. It first received a radio signal from a sensor node on 
the second circle and wrongly considered it ti be on the third circle. Then, it 
observed a radio signal from a sensor node on the first circle at 9.25, and changed 
its level to two. Finally, at 9.45, it received a beacon, i.e., a radio signal that 
the base station emits. Then sensor node si identified its level as one. We can 
expect similar transition to the global synchronization during the movement of 
a sensor node if it initializes its own level while moving. 

Figure 7 shows a series of message emissions of sensor nodes si, s2, sll, sl2, 
s31, s32, s61, and s62, as in Fig. 5. In this experiment, global synchronization 
was attained at 13.7. It is obvious from the figure that sensor nodes do not lose 
synchronization once they are fully synchronized even if sensor nodes disappear. 

If a sensor node does not fall within radio range of any other sensor node, 
it is isolated. However, the sensor node can join the sensor network again when 
it is moved closer to one of the other sensor nodes. If new sensor nodes are 
deployed between the isolated sensor node and the sensor network, it can join 
the network through the mediation of the new nodes. An isolated sensor node 
can find neighboring sensor nodes by extending the range of radio propagation to 
inform another sensor node of its existence. When another sensor node adjusts 
the range of its radio signal to reach the isolated sensor node, that node can join 
the sensor network and attain synchronization again. 

From the above observations, we can conclude that our scheme can adapt to 
the dynamic changes in sensor networks, including the addition, removal, and 
movement of sensor nodes. A sensor network reaches synchronization even if 
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new sensor nodes are deployed or sensor nodes move. A sensor network does not 
lose synchronization once it is attained even if sensor nodes stop due to battery 
depletion. 



4.3 Larger Sensor Network 

Our scheme can be applied to sensor networks whose region is large and/or where 
a large number of sensor nodes are deployed, since there is no centralized control 
and it is highly ad hoc and self-organizing. 

However, as the number of sensor nodes increases, the time needed to reach 
global synchronization increases. Although it has been proved that “the time 
taken to synchronize is inversely proportional to the product e5” [5], we need 
further detailed investigation into the influence of those parameters, the number 
of sensor nodes, and the range of stimulus. 



4.4 Frequency of Data Gathering 

The frequency and the timing of data gathering can be controlled through ad- 
justing the emission of beacons from the base station. The beacon dominates the 
synchronization. In Fig. 8, we show the course of synchronization when the base 
station changes the frequency of beacon emission. At 6.41, global synchronization 
was accomplished. At 14.6, the interval of beacon signals was reduced to half. 
The change propagated the sensor network to the edge and, finally, the sensor 
network reached global synchronization at 22.9 with the reduced frequency. 

In this example, we slightly modified the scheme described in Section 3.2: 
Sensor node Si emits a message at i/i = 1.0 — 6i. Consider the case where sensor 
node Si is synchronized with sensor node Sj, whose level Ij is h — l. When sensor 
node Sj emits a message, (f>i is one when the sensor nodes are synchronized. 
Now, the frequency of sensor node Sj is doubled. When sensor node Sj fires, 
(pi is only 0.5, but sensor node Si is stimulated and (pi is raised from 0.5 to 
1.0. Thus, if 5i is smaller than 0.5, sensor node Si does not have a chance to 
emit a message. If 5i is larger than 0.5, sensor node Si emits a message later 
than the appropriate timing by 0.5. To overcome this problem, a sensor node 
adjusts its offset. When sensor node Si becomes synchronized with sensor node 
Sj and maintains synchronization for several times, it changes the offset 5i to 
5i — >■ l.Q — (pi + 5i- In the above example, 5i becomes 0.5 -I- <5^ and sensor node 
Si emits its sensor information earlier than emission of sensor node Sj by 5i as 
expected. 



5 Further Discussions 

In the preceding sections, we verified that our scheme was fully distributed and 
self-organizing, adaptable to dynamic changes of sensor networks, and scalable to 
the number of sensor nodes. In this section, we give further consideration to our 
data gathering scheme from the viewpoints of robustness and energy efficiency. 




424 



N. Wakamiya and M. Murata 




0 5 10 15 20 25 30 

time 



Fig. 8. Timing of message emission (changing frequency) 



5.1 Robustness to Failure of Sensor Nodes 

By robustness, we mean that sensor information can be continuously gathered 
from sensor nodes at the desired rate even during the failure of some sensor 
nodes. 

When a radio transmitter fails, a sensor node cannot emit its sensor informa- 
tion. Before global synchronization, the broken node cannot contribute toward 
synchronization because it cannot stimulate other sensor nodes. However, as 
long as all sensor nodes can find a sensor node whose level is smaller, global 
synchronization can be accomplished. The broken node itself can synchronize 
with others. Thus, its sensor information successfully reaches the base station. 
After global synchronization, the failure of a radio transmitter has no influence 
on synchronized data gathering. 

If a radio receiver fails before global synchronization, its sensor node does 
not become synchronized with the other sensor nodes. As a result, it continues 
to emit its sensor information at its own interval, independently of the others. 
If it has wrongly identified its level, neighboring sensor nodes that receive radio 
signals only from the failed sensor node are influenced and become isolated from 
the sensor network. Other sensor nodes can correctly determine their levels and 
attain global synchronization among themselves. 

On the other hand, if the failed sensor node has correctly identified its level 
before the failure, its message emission disturbs global synchronization. A sensor 
node on the next level receives radio signals from both normal and failed sensor 
nodes at different phases. Being stimulated by those non-synchronized signals, 
the state and phase of the sensor node does not converge, and thus they never 
become synchronized. Consequently, global synchronization cannot be accom- 
plished in this case. However, it is easy to solve this problem. When the failed 
node stops message emission or sets its level at a large enough value, it never 
stimulates other sensor nodes and there is no disturbance. 

In some cases, a timer gains or loses, being affected by, for example, geo- 
magnetism. A sensor node with a wrong timer regains synchronization through 
reception of radio signals from sensor nodes on the lower level. Sensor nodes that 
are stimulated by the failed sensor node vary from the global synchronization. 
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However, since they are stimulated by other correct sensor nodes, they again 
reach synchronization. 



5.2 Energy Efficiency 

Since a sensor node is typically operated with a battery, which often cannot be 
replaced, power management plays a vital role in sensor networks. All of the 
components that constitute a sensor node consume battery life when they are 
active and idle [10,9,11]. In addition, a radio transceiver expends battery power 
in sending and receiving data. Previous research work on data gathering for 
sensor networks [2,3,4] took into account power consumption in gathering data 
from sensor nodes in order to prolong the lifetime of sensor networks. In this sub- 
section, we investigate how energy-efficient data gathering can be accomplished 
with our proposed scheme. 

Since our scheme can attain a global synchronization that effectively sched- 
ules the emission of sensor information, we can save power consumption by 
turning off unused components of a sensor node between periodic message emis- 
sions. Before global synchronization, a sensor node should keep awake to listen 
for radio signals of other sensor nodes and to emit a message as stimulus for 
others. However, after global synchronization is attained, a sensor node can 
move to a power-saving mode. It turns off unused components including a radio 
transceiver from (pi = 0.0 to 1.0 — 6i. At (pi = 1.0 — 5i, a sensor node turns on a 
radio transceiver to emit a message. Then, at pi = 1.0, it receives radio signals 
from sensor nodes, which it can use to confirm that it is well synchronized. Then, 
its phase pi returns to zero and the sensor node goes to sleep again. As a result, 
battery consumption can be reduced to 5i compared to fully active operation. 

However, a sensor node itself cannot detect global synchronization because 
it can perceive only the sensor nodes around it. Thus, we propose to start the 
power-saving mode when a sensor node considers it is synchronized with one 
or more sensor nodes whose level is smaller than its own level by one. When 
a sensor network has not yet reached global synchronization, the timers of the 
sensor nodes that a sleeping sensor node relies upon might either gain or lose. 
When they gain, the sensor node receives radio signals at the phase pi < 1.0. 
Since it is awake, it is stimulated. When they lose, radio signals reach the sensor 
node while it is sleeping. Thus, it cannot accomplish synchronization. To attain 
synchronization again, a sensor node stops the power-saving mode when it does 
not receive any valid radio signals while it is awake. 

The power-saving mode introduces another problem when it is activated 
before global synchronization. As shown in Figs. 4 and 6, a sensor node oc- 
casionally misidentifies its level. When a sensor node reaches synchronization 
with a wrongly identified level, it cannot correct the level while in the power- 
saving mode. For example, consider the case where newly deployed sensor node 
Si, whose actual level is fc — 1, accidentally receives a radio signal from sensor 
node Sj of level Ij = k — 1. It wrongly considers its level to he k = k and 
becomes synchronized with sensor node Sj. The power-saving mode starts be- 
cause synchronization is attained. Sensor node Si sleeps from (/>j = 0 to 1.0 — <5^, 
but it can hear a radio signal from sensor node Sj at pi = 1.0. To correct the 
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level, sensor node Si must receive a radio signal from a sensor node of level 
k — 2. However, because of the offset 6j, the signal reaches sensor node Si at 
4>i = l.O + 6j = 6j. Since Sj must be small enough from the viewpoint of energy 
efficiency, Sj < 1.0 — (5^ usually holds. Thus, sensor node Si cannot receive a 
radio signal from a sensor node of level k — 2 and thus cannot correct its level. 
One possible solution to this problem is to turn on a radio transceiver around 
• 

6 Conclusion 

In this paper, inspired by biological systems, we proposed a novel scheme for data 
gathering in sensor networks that is fully-distributed, self-organizing, robust, 
adaptable, scalable, and energy efficient. Through simulation experiments, we 
confirmed that our scheme provides those advantages. We are now considering 
ways to make even more efficient data gathering. When sensor nodes are deployed 
densely, there are areas that are monitored by two or more sensor nodes. In such 
a case, it is a waste of energy to collect duplicated information from all of those 
sensor nodes. We first organize a cluster of the sensor nodes that monitor the 
same area and then make one of the sensor nodes in the cluster advertise the 
sensor information. In a pulse-coupled oscillator model, we can attain a phase- 
lock condition, where oscillators fire with a constant phase difference, by using 
negative stimulus. We need to consider how sensor nodes are clustered and how 
the stimulus should be determined in a distributed way. 

In some cases, a variety of sensor nodes is deployed in a region, and they might 
have different sensing frequencies. In our scheme, all sensor nodes are synchro- 
nized, and sensor information is gathered from all sensor nodes at an interval 
identical to the fastest timer period among the sensors. Such global synchro- 
nization apparently wastes the battery life of infrequent sensors. The proposed 
scheme seems useful for overcoming this problem, but we need to further study 
this issue in more detail. 
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Abstract. Biomedical High-Throughput Screening (HTS) requires spe- 
cihc properties of image compression. While especially when archiving a 
huge number of images of one particular experiment the time factor is 
often rather secondary, other features like lossless compression and high 
compression ratio are much more important. Due to the similarity of all 
images within one experiment series, a content based compression seems 
to be especially applicable. Biologically inspired techniques, particularly 
Artificial Neural Networks (ANN) are an interesting and innovative tool 
for adaptive intelligent image compression, although with JPEG2000 a 
promising alternative has become available. 



1 Introduction 

The importance and spread of High-Throughput Screening (HTS) [1] methods, 
being applied in almost all scientific fields, from life science to engineering, is 
increasingly high. All of these have in common that thousands, millions or even 
more equal or similar experiments or investigations are performed. Besides the 
actual result of a particular experiment, which is often rather low-dimensional 
- sometimes even a simple yes/no decision - HTS often leads to an immense 
quantity of data. This may be more or less temporary, but has to be stored for 
example to be processed in a batch process or for further reference. Depending 
on the particular application this data is more or less high-dimensional. 

Especially when dealing with image based data processing, the amount of 
data to be handled often becomes very high. A typical high-resolution digital 
image coming from a microscope is between 15 and 50 MBytes, i.e. 3.000 x 2.000 
pixels and 16 bit per color channel leads to a net size of about 35 MBytes. As a 
technical consequence to save storage capacity, the images can be compressed. 

In contrast to on-line compression, where data and particularly images are 
compressed and decompressed on the fly, such as on the Internet, compression 
time plays not a very important role when storing is rather (long-term) archiving. 
On the other hand, in scientific applications not just visual correctness of the im- 
ages (having a pretty picture for the human eye) is required but also any kind of 
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distortion is usually not tolerable. Details must not be lost, since further process- 
ing is performed after the images have been compressed/decompressed, maybe 
even several times. This motivates the utilization of lossless coding schemes. 

Another important fact should be kept in mind when talking about HTS 
properties. The image content in all images of a particular HTS application is 
often very similar. In other words, the images to be compressed are characterized 
by a limited variability, because they contain for instance the same type of cell 
formation. In this example the meaningful information is not a cell itself, but 
maybe slight differences of its shape or texture. This can only be recognized 
correctly, if the images are not interfered by compression artifacts. This opens 
a realistic chance to apply an image content based compression. Besides some 
fractal approaches [2], Artificial Neural Networks (ANN) have been used more 
successfully [3], [4], [5], [6]. 

This paper reviews standard as well as intelligent and adaptive image com- 
pression algorithms in the context of High Throughput Screening. It demon- 
strates the features of different approaches by means of a number of real-world 
images coming from biomedical investigations. 



2 Standard Image Compression Algorithms 

2.1 Supply and Demand 

As already mentioned in Sect. 1, a lossless or nearly lossless compression is 
desired to meet the requirements of this kind of scientific image processing. It 
seems to be unnecessary to outline that a color depth of at least 3 x 8 bit is 
essential. 

Having set up these constraints, there are several standard image file formats 
(see Tab. 1) available, offering a more or less wide variety of internal compression 
algorithms [7], [8]. While RLE and LZW based compression [9] is originally 

lossless, JPEG [10] is a lossy compression format. It still offers an excellent 
balance of quality and compression. JPEG2000 [11] can be both lossy and lossless 
(nearly lossless). 

All of these image file formats have more or less different properties. When 
evaluated in a particular application context, some properties are rather advan- 
tageous and some tend to be disadvantageous. In general, all image file formats 
have their strengthes when utilized in the right way. In HTS png and tif are 
most suitable. The tz/ format is more frequently implemented in image acquisi- 
tion software. 

Depending on the actually used image acquisition hardware, mainly the GGD 
or GMOS chip [12], [13], there are also some proprietary raw formats available. 
Since the image acquisition system simply stores the raw data that comes di- 
rectly off the chip, it is generally system (vendor) dependent. A special software 
is required to transform the image to more versatile formats, i.e. tif, or to pre- 
pare it for further processing. Although this is the most basic and by the way 
also a technically very interesting way to store a digital image (not even the 
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Table 1. Selection of standard file formats and compression schemes as well as their 
snitability for storing image data in a High Throughput Screening environment. 



File format 


Gompression algorithm 


Max. color 
depth 


HTS 

suitability 


Bitmap {bmp) 


None 


3x8 bit 


- 




Run Length Encoder {RLE) 


3x8 bit 


-f 


Graphics Interchange 
Format {gif) 


Lempel-Ziv Welch {LZW) 


1x8 bit 


- . . . — 


Joint Pictures Expert 
Group {jpg) or {jpeg) 


Discrete Gosine 
Transformation 


3x8 bit 


- 


JPEG2000 


Discrete Wavelet 


3x8 bit 


-f 




Transformation 


3 X 16 bit 


+-f 


Portable Network 
Graphics {png) 


ZLib, CRC-32 


3 X 16 bit 


+-f 


Tagged Image File 


None 


3 X 16 bit 


- 


Format {tif or {tijf 


Lempel-Ziv Welch {LZW) 


3 X 16 bit 


+-f 



interpolation of the four monochrome photo elements making up one color pixel 
is performed), it is not suitable for general purpose HTS applications though. 



2.2 Standard Lossless Image Compression in Practical Operation 

This subsection summarizes some measured values of standard file based lossless 
compression algorithms in terms of the compression ratio. For this investigation 
and for those in the following sections as well, a set of biological and medical 
images was used. Samples of the extensive data set are shown in Fig. 1. 



Table 2. Lossless image compression: file size and resulting compression ratio based 
on the test image in Fig. 1 (left). The given file sizes may slightly vary depending on 
the used software. 



Image file 
format 


Resulting file 
size 


File size 
ratio 


Comment 


net 


16.068.000 bytes 


1.00 


net image size, without file header 


tif 


16.084.696 bytes 


1.00 


original microscope file 


bmp 


16.068.054 bytes 


1.00 


without compression 


bmp 


11.472.828 bytes 


0.71 


RLE compression 


png 


6.367.697 bytes 


0.40 




tif 


15.452.899 bytes 


0.96 


LZW compression (8k block) [7] 


tif 


7.319.552 bytes 


0.46 


LZW compression (16k block) [7] 
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Fig. 1. Two sample images of the test data set showing microscope images of barley 
leafs. The original image size is 2600 x 2060 pixel with a color depth of 3 x 8 bit. 



The results shown in Tab. 2 indicate that png and tif (16k LZW) outperform 
all other lossless variants. Both are able to compress the images by more than 
50 percent. The results obtained with other images of the test data set are very 
similar. 

Lossy compression (i.e. JPEG) can squeeze the images to nearly each desired 
size at the cost of more or less distinct image degradation. These issues are 
discussed in the next sections. 



3 Intelligent Biologically Inspired Image Compression 

3.1 Motivation 

Universal compression algorithms usually process all images equally. In other 
words, possibly existing a-priori information on the image content is not passed 
to the compression algorithm. In general it seems to be difficult to gather this 
information at all. Furthermore, this would limit the versatility - resulting in a 
hardly manageable number of very specialized compression methods and image 
file formats. For most cases existing file based compression methods meet the 
requirements very well. After all, who wants to analyze the image content before 
transmitting it via the internet. 

A different situation can be found in image based HTS. Here all images often 
contain very similar information and since there is a very high number of images 
to be processed, it seems to be worthwhile and sometimes even necessary to 
search for more sophisticated solutions. 

In many successful implementations artificial neural networks have proven to 
provide a key feature which enjoys all the advantages of an intelligent problem 
solution. Now intelligent image compression seems to be another challenging 
application for neural nets [4] . 
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3.2 Auto-associative Multiple-Layer Perceptrons 

The most straightforward method of applying neural nets to image compression 
is the utilization of Multiple-Layer Perceptrons (MLP) [5]. 

MLPs [14], [15], [16] are the most frequently used supervised trained neural 
nets. Setting them into an auto-associative mode, they are able to produce an 
output vector which is, apart from a small remaining error, identical to the 
currently presented input vector. Assuming the length m of the hidden layer 
is smaller than the length n of both input and output layers, all information 
is compressed by the ratio m/n while passed from input to hidden layer. The 
inverse operation is performed between hidden and output layer. In contrast 
to the more common hetero-associative mode of feed-forward nets, the actually 
meaningful result, namely the compressed input vector, is not provided by the 
output but by the hidden layer. 

The network may contain one or three hidden layers. In the latter case the 
compression/decompression is made in two steps each and the result is provided 
by the middle hidden layer. The additional hidden layers (1 and 3) are usually 
of the equal length k with {m < fc < n). In general all layers of the net are fully 
connected. 



Restored Image 



Decompressed Image Block 



Decompression 
Compressed Image Block 
Compression 



Original Image 



Output Vector 





X, X, X, X, X, ■■■ — — _ Input Vector 



Image Block 



Fig. 2. The original image is divided into equally sized blocks (bx x by). All blocks are 
consecutively transformed into an input vector of length n = bx x by. The hidden layer 
is built up of m < n neurons and contains the compressed image block, m/n denotes 
the compression ratio. 
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The input image is divided into blocks of typically 8x8 pixels and then rear- 
ranged into a vector of length 64 or 192 containing all color triples (see Fig. 2). 
In case of just one hidden layer, each weighted connection can be represented 
by = 1, 2, . . . , n and j = 1, 2, . . . , m) or by a weight matrix of n x m 

and m X n respectively. While the network is trained to minimize the quadratic 
error between input and output, the weights are internally changed to find an 
optimal average representation of all image blocks within the hidden layer. In 
connection with the nonlinear activation function / of the neurons, an optimal 
transformation from input to hidden layer and vice versa from hidden to output 
layer has to be found during the network training by changing the weights Wij. 

For a single normalized pixel pi G [0, 1] of the original image this transfor- 
mation can be written as 

n 

^3 = (l<j<m) (1) 

i^l 

for the compression and 

m 

Pi = with (l<i<n) (2) 



for the decompression. 



4 Adaptive Intelligent Image Compression 

The key motivation for the utilization of content based image compression, 
namely the similarity of all images within a series of investigations, would be 
even stronger in an adaptive compression algorithm. The idea is to analyze the 
content of an image or a currently processed part of it and then select a very 
specialized compression. 

File based compression methods, as described in Sect. 2, treat all images the 
same way, regardless of their content. This is the most universal method and 
exactly what to be expected from a versatile graphics file format. 

Also the neural networks solution explained in Sect. 3 works without respect 
to different content of the processed image blocks. However, this is already more 
specialized since the training data set inevitably contains images with limited 
variability. By selecting the training images, some kind of hand-operated spe- 
cialization is performed. 

Although in a completely different context, there are some investigations 
demonstrating that image blocks of relatively small size (up to 32 x 32 pix- 
els) contain sufficiently limited information to be sorted into a small number 
of classes [17]. This is the major condition to successfully implement an image 
block adaptive compression system. The general idea is shown in Fig. 3. 

Each image block is processed by a specialized compression method which is 
optimized for the members of a particular class. There is a strong correspondence 
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Fig. 3. Based on the image blocks (see Fig. 2) a classification by means of extracted 
image properties (i.e. similarities on image level) is performed. Depending on its class, 
each block is processed by a specialized compression method. 



between a set of blocks belonging to the same class and the applied compression 
method. As long as the number of classes is low, the system seems to be feasible. 
There is a trade-off between good manageability (few classes) and quality (many 
classes). 

In general, this adaptive system can be used for any compression method. In 
the context of intelligent image compression, as described in Sect. 3, an overall 
biologically inspired system, based on different neural networks can successfully 
be applied. 

Looking at the parameters of the particular compression methods leads to a 
number of possible approaches. 



4.1 Compression Ratio Adaptive Processing 

Complexity Measure. In some cases it may be hard or not applicable to 
specify the desired compression ratio in advance. An adaptation, depending on 
a complexity measure, solves this problem. The classification is based on a com- 
bined complexity measure (i.e. entropy, object-background separation, texture 
analysis), which can be derived from image features. Although there is no stan- 
dard definition, usually image complexity is defined as some ratio of background 
homogeneity and foreground clutter [18]. The main problem is to find image 
features, which describe its complexity with regard to a compression relevant 
complexity^. Furthermore, the computational effort should be reasonable, which 
makes an extensive feature extraction, just to obtain a complexity measure, 
rather impossible. 

^ For example: the size of a jpg file depends on the complexity of the image. However, 
only and just in connection with the Cosine transformation being used for compres- 
sion in jpg. A different compression algorithm may require a different complexity 



measure. 
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Variable Image Decomposition. As an extension of the above described 
method with fixed-sized blocks, a variable image decomposition can be imple- 
mented. A complexity measure is used to find an optimal size of image blocks. 
These blocks have not to be analyzed furthermore, because a complexity measure 
has already (indirectly) been applied. 

Apart from finding a relevant and manageable complexity measure, it seems 
to be difficult to set up the compressors. Since the compression ratio of the 
MLP is fixed by the number of neurons in the middle hidden layer (see Sect. 3), 
the single compressors must have a different neural topology. In case of variable 
blocks even the width of input / output layers is not fixed. 



4.2 Similarity Adaptive Compression 

Although a variable compression ratio, as described in the previous subsection, 
may have some advantages, its implementation and parametrization is rather 
difficult. And after all, a fixed compression ratio has advantages, too: 

— a more precise estimation of required storage capacity is possible, 

— image blocks have not to be classified according to a hardly manageable 
complexity measure but by an easily implementable similarity criterion, 

— compressor resp. neural network topology is fixed. 

The entire system, consisting of classification and compression / decompres- 
sion, can completely be set up by a couple of neural networks of two different 
architectures. 



Classification. The classification is based on similarities on image level. This 
relieves the system of complex calculations. A Self-Organizing Map (SOM) can 
be used to perform the classification [19], [20]. For that purpose the SOM is 
unsupervised trained with a number of typical images. After this training, some 
classes are formed, which now can be used to pass the image blocks to the 
corresponding compressor. 

Other unsupervised neural architectures, i.e. Adaptive Resonance Theory 
(ART) [21], [22], seem to be suitable as well, but have not been tested yet. 

The actually utilized classification system determines the properties of the 
allocation of classes as well as the distribution of image blocks to a corresponding 
class. Fig. 4 shows the effects of both a proper and a deliberately false classifi- 
cation using an SOM. 



Compression / Decompression. All compressors are realized by the in Sect. 3 
described auto-associative Multiple-Layer Perceptrons. Since each network re- 
ceives only blocks of the same class, it develops into an expert for a particular 
and very selective image (block) content. This way, the reconstruction error can 
be kept rather low (see Sect. 4.3). 
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Fig. 4. Demonstration of the importance of a proper classification. From left to right: 
original image, restored image with correctly assigned class (optimized coder), restored 
image with randomly chosen class (non optimized coder). 



4.3 Lossy Image Compression Results 

While Tab. 2 in Sect. 2 summarized the results for lossless file based compression, 
this section compares practical results of lossy compression for both file based 
and neural systems. The key question, whether the occurring image degradation 
caused by a lossy compression method is still tolerable at all, and if so, to what 
extend, has to be answered in the context of the image processing next in line 
(succeeding the compression / decompression). Here, some problem dependent 
fine-tuning will definitely be required. 

The image data base is still the same. For samples see Fig. 1. The results for 
the reconstruction error and partly also for the compression ratio slightly depend 
on the actually used images but indicate the general trend. However, significantly 
different images may lead to varying results, especially at the adaptive methods. 
The computation time^ strongly depends on numerical implementation details 
and will plainly vary. It should only be taken as pretty rough clue. 

A Root-Mean-Square (RMS) error was used to evaluate the reconstruction 
quality. This seems to be suitable and sufficient, even though alternative mea- 
sures, such as maximal deviation, are conceivable and maybe more qualified for 
analysis purposes. If image degradation has to be judged relating to a special 
application driven criterion, the standard RMS measure can be substituted by 
a more specialized one. 

The stand-alone MLP network has been tested with 8x8 image blocks and 
a single hidden layer (8... 32 neurons) as well as three hidden layers ((16 — 
8 — 16) . . . (32 — 16 — 32) neurons). The results show that one hidden layer is 
sufficient. More layers just slow down the compression / decompression with no 
significantly better reconstruction errors. 

The compression ratio adaptive method has been implemented with entropy 
as complexity measure. The compression ratio and thus the reconstruction error 
strongly depend on a number of parameters. The same is true for the similarity 
adaptive compression. The SOM size has been varied between 3x3 and 7x7, 
the corresponding MLP networks as described above. 

^ For the neural networks based methods it is the recall time (on-line). 
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Table 3. Results of different lossy or nearly lossless compression methods. In order 
to achieve a statistical reliability, in addition to the shown test images (see Fig. 1) a 
number of similar images of the same data set was used. The results are either simply 
averaged (in case of low variance) or the range is given explicitly. 



Compression 

method 


Image size 
ratio 


Reconstruction 
error (RMS), approx. 


Compression 
time [sec] 


Stand-alone MLP 


0.50 


3- 10“® 


« 1 




0.125 


5- 10“® 


< 1 


Compression ratio adaptive 


(complexity measure) 


0.10... 0.25 


CO 

1 

0 

(N 

1 

O 

CO 


« 3 


(variable image decomp.) 


0.10... 0.45 


4- 10"'‘...9' 10“"^ 


4. ..6 


Similarity adaptive 


0.125 


CO 

1 

0 
CO 

1 

O 


« 1.5 


JPEG (max. quality) 


0.15... 0.20 


6- 10“® 


< 1 


JPEG (max. qual., progress.) 


0.17... 0.23 


CO 

1 

O 


< 1 


JPEG2000 


« 0.30 


5- 10“® 


10 



In competition to the neural based methods, JPEG and JPEG2000 as lossy 
respectively nearly lossless algorithms, have been tested with the same images. 
The JPEG quality coefficient was set to maximum and the JPEG2000 was run 
with the -lossless option. For example, the left image of Fig. 1 leads to a Frobe- 
nius norm of the color triple of [179, 179, 516] at an image size of 5.356.000 pixels 
per color channel. 

As the numerical values given in Tab. 3 demonstrate, the best results re- 
garding the reconstruction error offers the JPEG2000. This seems to be obvious 
due to its smooth transition from arbitrarily adjustable lossy compression up 
to a lossless mode. Using the -lossless option, JPEG2000 typically leads to a 
compression of 1 : 3. This seems to be appropriate, if there was not its rather 
slow processing (about 10 seconds). Significantly faster (and at the same time 
offering a higher compression) are the tested adaptive methods. However, this 
goes along with a worse reconstruction error. 

Though the stand-alone MLP is fast, it is even outperformed by the old 
JPEG algorithm. The differences between the several adaptive variants are not 
very significant. Within a certain but relatively wide range, each of the adaptive 
methods can be adjusted according to the user’s requirements. Thus, the major 
differentiating factor is rather their different handling, considering the statements 
in Sect. 4.1 and 4.2. 



5 Conclusions 



This paper presents a review of file based image compression methods as well as 
some intelligent neural networks based adaptive systems. All methods are judged 
against the background of image based biomedical High-Throughput Screening 
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(HTS) with its specific properties, such as a big number of similar images which 
have to be archived, often in a non time-critical manner. 

Since the archived images are subject to a further scientific processing, usu- 
ally a noticeable image degradation caused by coding is not acceptable. Depend- 
ing on the specific task of a particular HTS application, either the reconstruction 
error of decoded images or the computation time for compression / decompres- 
sion is the more crucial point. In every case, a trade-off between compression 
ratio, reconstruction error and computation time has to be found. 

The most straightforward approach is to use a lossless image file format, 
where LZW based tif leads to the best compromise of compression ratio, com- 
putation time and availability. However, a compression of more than about half 
the original image size is not possible for typical biomedical images. 

If a higher compression is desired, lossy image compression with an inevitable 
degradation of the original images is the only solution. Interestingly, but after 
a closer look at the details not at all surprisingly, the best quality, at least at 
an acceptable time effort, are not achieved by an adaptive and content based 
method, but rather utilizing the wavelet based nearly lossless JPEG2000 cod- 
ing. It is I to 1 1 decimal powers better and very easily manageable. In spite 
of its pretty good and probably in almost all practical applications acceptable 
reconstruction error, it offers compressed images of less than about | of their 
original size. So far it seems to be the best solution. However, it is rather slow 
compared to all other investigated methods. That means, the more the above 
mentioned trade-off turns out to be time critical, the more relevant becomes one 
of the suggested adaptive methods. 

Apart from considering the computation time it turns out, that for even 
higher compression rates the lossy variant of JPEG2000 or (the some faster) 
adaptive intelligent algorithms using artificial neural networks may be the only 
solution. Both are very well scalable. JPEG2000 and the stand-alone as well as 
the similarity adaptive Multiple-Layer Perceptrons have the advantage to offer 
an a-priori adjustable fixed compression rate. This may be a real advantage, if 
a particular image data set must not exceed a fixed storage capacity. 
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Abstract. Keywords are a simple way of describing a document, giving the 
reader some clues about its contents. However, sometimes they only categorize 
the text into a topic being more useful a summary. Keywords and abstracts are 
common in scientific and technical literature but most of the documents 
available (e.g., web pages) lack such help, so automatic keyword extraction and 
summarization tools are fundamental to fight against the “information over- 
load” and improve the users’ experience. Therefore, this paper describes a new 
technique to obtain keyphrases and summaries from a single document. With 
this technique, inspired by the process of protein biosynthesis, a sort of “docu- 
ment DNA” can be extracted and translated into a “significance protein” which 
both produces a set of keyphrases and acts on the document highlighting the 
most relevant passages. These ideas have been implemented into a prototype, 
publicly available in the Web, which has obtained really promising results. 



1 Introduction 

As the saying goes, “Time is Money", “Information is Power". So, most of us want 
to earn the most the power (i.e., information or knowledge) at the lowest possible cost 
(i.e., as soon as possible and with relatively little effort). To accomplish this, many 
communities make use of guidelines to write “easy-to-read” documents. These 
guidelines can be as simple as attaching an abstract and/or a list of keywords at the 
beginning of each document. 

However, most of the documents available on a daily-basis have neither abstract 
nor keywords. Examples of these “time consuming” documents are e-mail messages, 
web pages, newsgroup posts, etc. On the other hand, such documents are provided by 
digital means, so, at least, they are suitable for automatic processing. In fact, there are 
plenty and very different Natural Language Processing (NLP) techniques to help us to 
sort through this information overload (e.g., language identification [1][2], document 
clustering [3][4], keyword extraction [5][6] or text summarization [7]). 

Some of these techniques require human supervision [5] while others don’t 
[l-4][6][7]. Several don’t require training [6] but others do [l-5][7]. Some rely only 
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on statistical information [1-4] [6] and others employ complex linguistic data [5] [7]. A 
few use only one document [6] while others need a document corpus [l-5][7]. 

The question becomes wouldn’t it be great to use only one technique to carry out 
several of these tasks? Ideally, it should be extremely simple (i.e., it should rely only 
on free text instead of linguistic data), fully automatic (i.e., it should need neither 
human supervision nor ad hoc heuristics) and scalable (i.e., feasible with both single 
documents -a web page- and document corpora -web sites). 

Biology has inspired many computational techniques that have proven feasible and 
reliable (e.g., genetic algorithms or neural and immune networks). So, trying to find 
such a technique we also turned to biology. Among living beings each individual is 
defined by its genome, which is composed of chromosomes, which are divided into 
genes and then constructed upon genetic bases. Likewise, if we consider a document 
as an individual from a population -a document corpus- we can see that documents 
are composed of passages, divided into sentences built upon words. Following this 
analogy, we hypothesized that two documents written in the same language or 
semantically related would show similar “document genomes”. This paper will 
explain how these document genomes can be extracted and translated into 
“significance proteins” (i.e., keywords, keyphrases and summaries). 



2 Biological Definitions 

Since our proposal is heavily inspired by biological phenomena some definitions are 
provided to clarify various aspects of the techniques and algorithms described later in 
this paper. 

Definition 1: Nitrogenous bases 

Molecules involved in the pairing up of DNA and RNA strands. They include adenine 
(A), thymine (T), cytosine (C), guanine (G) and uracil (U). Uracil only exists in RNA 
replacing thymine which is only present in DNA. Adenine, cytosine and guanine are 
common to both DNA and RNA. The possible base pairs are AT or AU and GC. 

Definition 2: Nucleotide 

A nucleotide is an organic molecule composed of a nitrogenous base, a pentose sugar 
(deoxyribose in DNA or ribose in RNA), and a phosphate or polyphosphate group. 
Nucleotides are the monomers of nucleic acids. 

Definition 3: Deoxyribonucleic acid (DNA) 

DNA is a polymer and the main chemical component of chromosomes. It is usually 
the basis of heredity because parents transmit copied portions of their own DNA to 
offspring during reproduction propagating their traits. DNA is a pair of chains of 
nucleotides entwined into a “double helix”. In this double helix, two strands of DNA 
come together through complementary pairing of the nucleotides’ bases, because of 
this, DNA is usually represented as a unique text string (e.g. ...GGCGATACATG...) 
which has been of primary importance for the development of bioinformatics. 




442 



D. Gayo-Avello, D. Alvarez-Gutierrez, and J. Gayo-Avello 



Definition 4: Ribonucleic acid (RNA) 

RNA is, as DNA, a nucleic acid although slightly different from DNA, both in 
structure and function. As it has been explained above, RNA is composed of the same 
bases as DNA except for uracil. The RNA structural differences give the molecule 
greater catalytic versatility and help it to perform its many roles in the transmission of 
genetic information from DNA (by transcription) and into protein (by translation). 

Definition 5: Transcription 

The process of transcribing genetic information from DNA into a messenger RNA 
molecule using the DNA molecule as a template. Transcription is the prior step before 
protein biosynthesis. 

Definition 6: Messenger RNA (mRNA) 

mRNA is transcribed directly from a gene’s DNA and carries the code for a particular 
protein from the nucleus to a ribosome in the cytoplasm and acts as a template for the 
formation of that protein. mRNA is a single- stranded molecule. 

Definition 7: Transfer RNA (tRNA) 

A relatively small RNA molecule that transfers a particular amino acid to a growing 
polypeptide chain at the ribosomal site during translation. 

Definition 8: Amino acid 

Amino acids are the constituent molecules of proteins. There are 20 amino acids 
directly expressed by means of DNA. However, a protein can contain amino acids 
that differ from these twenty; if this is the case, the different amino acid has been 
transformed after translation. 

Definition 9: Protein 

Proteins are polymers consisting of one or more strings of amino acids. Each string 
folds into a 3D structure, existing of four different levels of protein structure. 
However, for our purposes, we are interested in only two of them; (1) Primary struc- 
ture: the linear amino acid sequence forming a polypeptide. (2) Quaternary structure: 
the association of multiple polypeptide subunits to form a functional protein. The pri- 
mary structure is made during the translation and the higher structures are reached 
during the protein folding process. Proteins are primary components of living orga- 
nisms; they can be used as hormones, enzymes, structural elements or even to obtain 
energy. 

Definition 10: Ribosome 

A ribosome is a structure composed of RNA (ribosomal RNA or rRNA) and proteins 
that can translate mRNA into a polypeptide chain (usually a protein). Ribosomes are 
found in the cytoplasm of all cells and consist of two subunits. 
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Definition 11: Translation or protein synthesis 

Protein synthesis involves three steps: (1) preparing tRNA molecules for use by the 
ribosome; (2) attaching the ribosome to the mRNA; and (3) the initiation, elongation 
and termination phases of translation, where an amino acid chain forming the primary 
structure of a protein is constructed. The process of translation will be described with 
more detail in the next section. 

Definition 12: Bioinformatics 

Information technology as applied to life sciences. For instance, the techniques used 
for the collection, storage, retrieval, data mining and analysis of genomic data. Other 
applications include sequence alignment, protein structure prediction, etc. 



3 Synthesis, Folding, and Functions of Proteins 

As defined above, DNA is capable of encoding 20 different types of amino acids; 
these are the basic components of proteins which play essential roles in almost every 
biological process. Therefore, cells need to produce proteins using their DNA as a 
kind of “blueprint”. 

However, DNA is not very chemically versatile and, moreover, too valuable to 
work directly on top of it to produce the needed proteins. Because of this, to 
synthesize any protein the cell must first copy a portion of its DNA (i.e., the gene 
encoding the protein) into a single-stranded molecule of mRNA which is sent to the 
cytoplasm. There, it will be used by ribosomes as a template to compose the final 
protein. This prior step is called transcription while the process to obtain a protein 
from the mRNA molecule is called protein synthesis or translation. 

As it will be shown later in this paper, our proposal is freely inspired by the 
information encoding capabilities of DNA, the transcription of DNA into mRNA, its 
translation into the primary structure of a protein and the folding of this protein to 
reach its fully functional form. Therefore, the following paragraphs offer a more 
thorough description of the translation and folding processes. 



3.1 The Translation Process 

Protein synthesis begins with the attachment of the small ribosomal subunit to the 
mRNA string. Then the initiator tRNA molecule binds to the start codon' AUG. This 
step is named initiation (see Fig. 1) and after it begins the elongation phase. 

The elongation phase starts when the large ribosomal subunit is attracted by the 
initiator tRNA and binds to the small subunit completing the ribosome (see Fig. 2). 
During this phase the ribosome moves along the mRNA string one codon at a time. 
Each of these codons in the mRNA has a complementary anticodon in tRNA 
molecules which, as we already know, carry amino acids. This way the polypeptide 



' A codon is a sequence of three adjacent nucleotides in mRNA determining the binding of a 
particular amino acid (carried by a tRNA molecule) or the signal to stop the translation. 
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sequence, dictated by DNA and transmitted by mRNA continues growing (see Fig. 3) 
until the stop codon (UAA, UAG or UGA) is reached by the ribosome. At that moment 
the translation process enters the termination phase. 




Fig. 1. Initiation of the translation process. Fig. 2. Start of the elongation phase. 



When the ribosome reaches the stop codon no tRNA is attracted, the ribosome 
dissociates and both ribosomal subunits leave the mRNA. The product of this process 
is a polypeptide, that is, the primary structure of a protein (see Fig. 4). 




Fig. 3. End of the elongation phase. Fig. 4. Termination of the translation process. 




3.2 Protein Folding and Functions 

As it was explained in Definition 9, proteins must fold into a 3D structure to perform 
their many functions. All of the information required to reach the final folded form is 
contained in the primary structure since proteins fold into low energy configurations 
depending on the interactions between their constituent amino acids. Predicting 
protein folding from the amino acid sequence has been a major problem for over a 
decade and many techniques have been proposed (e.g., LINUS [8] or 
folding @ home^[9] ) . 



^ http://www.stanford.edu/group/pandegroup/folding 
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Many of the ideas behind these techniques such as maximization of entropy during 
the folding process [8] have greatly influenced some of the algorithms described later 
in this paper. The functions of some proteins, such as hormones and enzymes, were 
also especially influential since we were interested not only in obtaining “document 
proteins” (i.e., keyphrases) but also in the possibility of such “proteins” having active 
effects on the original document to achieve automatic summaries. 



4 A DNA for Natural Language? 

Now that we have broadly described the main biological phenomena in which our 
techniques have rooted in, we can more clearly define our proposal. Let’s start with 
the concept of “document genome”. To see what a “document genome” looks like 
let’s consider an extremely short text shown in Fig. 5. 

The rain in Spain stays ain-(3) -in- (2) ... he-p(l) 

mainly in the plain. e-pl(l) -pla(l) plai(l) lain(l) 

Fig. 5. An example document. Fig. 6. Partial list of 4-grams for the previous 

document. It is reverse-ordered by simple frequency 
and blanks have been replaced by hyphens. 

A common technique to analyze texts is based on the use of n-grams which are 
simply sequences of length n. They can include either words or characters and the 
items do not need to be contiguous. However, frequently the term n-gram refers to 
slices of adjoining n characters. For the purposes of this paper we will use this 
definition of n-gram throughout. Moreover, while it is common when working with 
«-grams to obtain a variety of different lengths, our document DNA will use fixed 
length n-grams^, usually 4-grams or 5-grams (see Fig. 6). 

By comparing the “most frequent” n-grams in two different sequences it is possible 
to perform language identification and, to some extent, document categorization. 
However, these “most frequent” n-grams are usually obtained by inspection [3]. 

Therefore, n-grams have commonly been used to perform basic analysis of natural 
language and are a feasible way of performing language identification. However, 
these lists of n-grams with their corresponding frequency are quite distinct from DNA 
in living beings. On the other hand, since we are working with fixed length n-grams 
we can easily construct a string including all the n-grams from the document, 
repeating each n-gram as many times as specified by its simple frequency and 
ordering them alphabetically (see Fig. 7). 

This form of pseudo-DNA is well suited to perform comparisons among two or 
more documents’ genomes. The very same sequence alignment algorithms employed 
in bioinformatics can be used over these pseudo-DNA strings to perform language 



^ By doing this the algorithms are much simpler. It would be interesting to study the feasibility 
of mixing n-grams of different sizes into a single pseudo-DNA string. 
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identification and document clusterization and, in fact, these algorithms can be greatly 
simplified taking advantage of the alphabetical ordering of the “genes”. 



ain-ain-ain-ainlays-e-ple-rahe-phe-r-in--in-in- iin-Sin- s 
in-tinlylainly- i-maimainn-inn-Spn-stn- thnly-pain-plaplai 
-rai rains -ma-SpaSpai - stastaytaysThe- the- -The- they-inys-m 



Fig. 7. Example document genome. Bold type is used only for the sake of clarity. 



While this technique appears to be equivalent to the “out-of-place” measure used in 
[3] to compare n-gram profiles the concept of document distance is, perhaps, more 
easily understood using the pseudo-DNA and sequence alignment algorithms. 
Currently the application of this technique to language Identification and topic 
clustering is a work in progress, but it has many more possible applications such as 
keyphrase extraction and text summarization. 

This pseudo-DNA needs, however, some improvements. Since the simple 
frequency depends on the length of the document we cannot use it to determine the 
number of repetitions of each «-gram. Instead, we must use their relative frequency 
and perform some scaling to transform the floating point values into integers. Also, 
using a logarithmic scale appears to be an adequate solution. What is more, the simple 
frequency is not the most relevant measure associated with each n-gram. For instance, 
following with the prior example document, we cannot say that Spain and mainly 
are equally relevant although the 4-grams Spai and inly share the same frequency. 

Working with bigrams there are various measures to show the relation between 
both elements of a pair (e.g., mutual information, Dice coefficient, (|)^ coefficient. 
Loglike, etc.) Such measures provide much more interesting information than simple 
frequency but they cannot be applied straight forward to n-grams without a 
generalization. For this proposal we have chosen the Fair Specific Mutual Information 
(SI^ [10] (see equations 1 and 2). 



SI_f((w,...wJ) = log 

V 



p(w,...wy 

Avp ^ 



( 1 ) 



^ n~\ 

= 7 ■ Z ^(^1 • • 

n-\ 

In this measure, W]...w„ represents an n-gram, while wj.-.wi and w,+i...H’„ are two 
fragments of that n-gram -e.g., Spai, S and pai, respectively. In addition p(wj...w„) 
is the probability (i.e., relative frequency) of the full n-gram while p(wi...Wi) and 
p(wi+i...wj are the appearance probabilities of each segment not only as part of the 
original n-gram'*. 




^ For instance, in the case of the 4-gram inly, the segment in appears as left-sider in five 
4-grams: [in-S] pain, ra [in-i] n. Spa [in-s] tays, [in- 1] he and ma [inly] . 
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A two-step algorithm is provided later in this paper (see Appendix). The first step, 
precalculateMatrix, precalculates a table of data from a sequence of n-grams. The 
second one uses that table to calculate the SIJ^ for a particular n-gram. For instance, 
by applying these algorithms we found that the significances for Spai and inly are 
2.013 and 1.97 5, respectively. 

However, the values themselves are of little importance. What is really important is 
that they allow us to rank the «-grams according to their relevance. From this ranking 
the document DNA is built. But, first, the ranking must be scaled to the range 1...K 
(being K a power of 2). This way the most relevant n-gram will appear K times in the 
pseudo-DNA string while the least relevant will appear only once. As explained 
above, all n-grams are placed in alphabetical order within the pseudo-DNA string. To 
summarize this section we provide the following definitions. 

Definition 13: N-Gram 

A sequence of ISO-8859- 1 (Latin 1) alphabetic characters, either lowercase or 
uppercase, or blanks. The length of the sequence (i.e., the size of the n-gram) is a 
parameter of the document DNA since all the n-grams within a document DNA must 
be equal in length. The most common being 4-grams and 5-grams. 

Definition 14: Docnment Gene 

A document gene is a variable length repetition of a unique n-gram. The number of 
appearances of the n-gram in the gene depends on its ranking within the original 
document’s n-gram sequence according to the Fair Specific Mutual Information 
measure (S'/J). A gene has at least one n-gram and at most K, a power of 2. K is a 
parameter of the document DNA. 

Definition 15: Document DNA 

A document DNA is a text string representing a sequence of document genes in 
alphabetical order without any separator. 



5 Synthesis, Folding, and Effects of Significance Proteins 

According to definitions 13 to 15, we can extract from a document written in any 
western language a “genome” which would encode the “significance” of the different 
n-grams occurring in that document. If this “genome” is in some way similar to the 
DNA from living organisms it should hide some valuable information within it. Such 
information could be extracted by means of a translation process similar to the protein 
biosynthesis. The results from this translation process (i.e., the significance protein) 
would ideally provide a way to compact themselves in a suitable representation of the 
most relevant information in the document (i.e., a list of keyphrases) and should also 
be capable of modifying the document itself (i.e., providing an automatic summary). 
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5.1 Document’s DNA Translation into a Significance Protein 

To implement such ideas we must previously find the counterparts for mRNA, tRNA 
and the ribosome in this scenario. Since the techniques shown in this paper are freely 
inspired by protein biosynthesis we can redefine some roles. This way, our proposed 
technique will use the elements shown in Table 1. 



Table 1. Biological elements and their computational counterparts. 



Biological element 


Computational element 


mRNA 


Document’s plain text 


tRNA 


Spliced document DNA 


Polypeptide chain (protein’s primary 
stmcture) 


Document chunks with significance weights 


Ribosome 


The ribosomalTranslation Algorithm 


Folded protein 


Document’ s keyphrases 



The basic ideas behind this technique are quite straightforward: 

1. The document DNA encodes, by means of the different lengths of its genes, 
the significance of each n-gram from the document. This DNA can be spliced 
into chunks which will carry a “significance weight” attached to a specific 
M-gram behaving in a similar way to tRNA (see Fig. 8). 

2. The plain text from the document doesn’t provide any information to the 
computer about the different relevance of each passage and word. Flowever, it 
is well-suited to be sequentially processed and, thanks to our computational 
tRNA, a weight can be assigned to each «-gram from this text (see Fig. 9). 

3. We already know that real proteins fold into a low-energy conformation or, 
which is the same, a maximum entropy state [8]. In a similar fashion, the 
“significance weights” assigned to each «-gram from the document’s text will 
be accumulated while the mean significance per character continues growing. 
This way, a sequence of chunks of text with maximum significance will be 
obtained, being the equivalent to the polypeptide chain in protein biosynthesis. 

4. All these tasks will be carried out by an algorithm (i.e., our ribosome) shown 
in the Appendix. 



The- 




24 1 



Xli© ic3.iri in Spsin 

The- 




Fig. 8. Document DNA is spliced into “tRNA Fig. 9. This tRNA attaches to the document’s 
molecules” carrying significance weights. text endowing it with significance values. 
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5.2 Significance Protein Folding: Keyphrase Extraction 

The ribosomalTmnslation algorithm splits the document’s text into a sequence of 
chunks with maximum significance. This sequence is analogous to the primary 
structure of a protein since it has all the information to reach the final functional form, 
however, it still has to undergo the folding process (see Fig. 10). 

This process is performed by the algorithm proteinFolding (see Appendix) which 
depends on three different algorithms: (1) selfAttract (see Appendix), (2) mergeKeys 
and (3) sliceKeys. The last two algorithms are quite simple: mergeKeys receives as 
input a sequence of weighted keyphrases, a threshold and a window size and merges 
all the keys weighted above the threshold and inside the specified window. sliceKeys 
simply drops any keyphrase below the specified threshold. 



5.3 Significance Protein Effects on the Document: Summarization 

Once the proteinFolding algorithm has produced the significance protein this can act 
on the document’s text to modify it, producing both a highlighted version of the 
document and an automatic summary. The algorithm to apply the protein over the 
document’s text, blindLight, and another algorithm to obtain a refined list of 
keyphrases from the folded protein are shown in the Appendix. 

of-Liberia ' s (22 . 06 ) -for-ECOWAS (21 . 83 ) a-Liberia ' s ( 21 . 79 ) 
Scott(21.75) Economic (21 . 32 ) House-Nyudueh ( 21 . 3 1 ) 

Liberia ' s (21 . 25 ) National (2 0 . 94 ) House-Nyudueh ( 20 . 91 ) 
A-West (2 0 . 87 ) International (2 0 . 82 ) Taylor ' s (2 0 . 82 ) ... 



Fig. 10. Top 12 most significant chunks from a web page^. Chunks that will likely come to- 
gether to form the “folded protein” are shown in bold (blanks have been replaced by hyphens). 



6 The “blindLight” Prototype 

To test the feasibility of previous ideas we developed a prototype and made it publicly 
available in the Web®. This prototype, called “blindLight” , receives a URL from the 
user and returns as results three different views from the original web page: a 
“blindlighted” version of the page, a list of keyphrases and an automatic summary. To 
reach such results the prototype performs a number of steps (see Fig. 11) which are 
described below. 

1 . The web page is distilled to obtain the main contents as plain text. 

2. From these contents two genomic sequences are extracted (a 4-gram gene 
sequence and a 5 -gram gene sequence -see Section 4). 



® http://edition.cnn.eom/2003AVORLD/africa/08/04/liberia.peacekeepers/ 
® http://www.purl.org/NET/blindlight 
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3. Both sequences are translated into significance proteins (Section 5.1) which, in 
turn, are folded (Section 5.2) obtaining two lists of candidate keyphrases. 

4. The folded significance protein acts on the document obtaining the most 
relevant passages from the document (Section 5.3). 

5. The results are shown to the user (see Fig. 12 and Fig. 13). 




■=> 



Peacekeepers expect Taylor to leave power 

Rebel leader: Opposition does not trust Liberian president 
Tuesday, August 5, 2003 Posted: 2:47 AH EOT (0647 GMT) 

MONROVIA, Liberia (CNN) -- A West African organization 
that sent an initial contingent of peacekeepers to Liberia 
expects President Charles Taylor to abandon power once the 
force is deployed, a spokeswoman for the regional bloc 
said Monday. 




c? 



^blindliqht 




Fig. 11. The different steps to “blindlight” a document. 



7 Partial Results and Conclusions 

Further and more thorough analysis of these keyphrase extraction and summarization 
techniques is needed. However, some proof-of-concept tests have been performed and 
the results, although they could certainly be improved, are extremely promising. This 
preliminary experiment was carried out over 20 articles from three different journals 
{Psycoloquy, Behavioral c6; Brain Sciences Preprint Archive -both about cognition- 
and the Journal of the International Academy of Hospitality Research -about hotel 
industry). All of which are available on the Web and include a list of keywords 
provided by the articles’ authors. 

The test was quite straightforward. First, the described algorithms were applied 
over “censored” versions of the papers (i.e., the abstract, list of keywords and 
references were manually removed from each article) to obtain a list of keyphrases. 
Then, the percentage of matches was computed for every execution. An automatically 
extracted keyphrase was a match if it was in the original list of keywords; however, 
keywords from the list not occurring in the text of the article were not taken into 
account. Table 2 shows an outline of the results. However, these results have two 
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major caveats: (1) some authors provide many keywords not included in the article’; 
(2) our techniques obtain much more keyphrases apart from the matches so experts 
from the respective fields should check their relevance to the article. 





^blirid Light 

MfimMR R-tHu. 







Fig. 12. The actual layout of a web page. Fig. 13. The same web page “blindlighted”. 

Thus, a new proposal to perform some NLP tasks such as language identification, 
document clustering, keyphrase extraction and text summarization has been presented 
with particular emphasis in the last two tasks. The described techniques are based 
primarily on biological phenomena, and they show that naive keyphrase extraction 
and summarization algorithms perform feasibly without human guidance and no more 
than free text from a single document. 



Table 2. Results for the preliminary test on automatic keyphrase extraction. 



Journal 


Mean author’s keywords in 


Mean 


Max. 


Min. 


contents 


matches 


matches 


matches 


B&BSPA 


78.5% 


40.4% 


58.5% 


33.0% 


Psycoloquy 


55.3% 


36.8% 


66.7% 


00.0% 


JIAHR 


87.5% 


50.2% 


65.0% 


31.4% 



’ One article from Psycoloquy provided eight keyphrases and only one (12.5%) was mentioned 
in the article. There were many lists of keywords with less than 20% of them in the contents. 
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Appendix: Algorithms 



Algorithm precalculateMatrix (ngramSequence) 

Input: the list of weighted n-grams ngramSequence 

1. totalNgramsWeight ^0 

2. for each n-gram ngram in ngramSequence do 

3. totalNgramsWeight ^ totalNgramsWeight + ngram.weight 

4. firstSegments ^ getFirstSegments(ngram) 

5. secondSegments ^ getSecondSegments(ngram) 

6. from i ^-0 to size of firstSegments do 

7. first ^firstSegments( i ) 

8. sec F- secondSegments(i) 

9. precalcMatrix(first)(sec) ^ ngram.weight 

10. precalcMatrix(first)(ALL) ^ precalcMatrix(first)(ALL) + ngram.weight 

11. precalcMatrix(ALL)(sec) ^ precalcMatrix(ALL)(sec) + ngram.weight 

12. loop 

13. loop 

14. totalSegmentsWeight ^0 

15. for each value weight in precalcMatrix(ALL) do 

16. totalSegmentsWeight ^ totalSegmentsWeight + weight 

1 7. loop 
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Algorithm SI _f (ngram) 

Input: the object ngram storing an n-gram plus its weight (no. of appearances in 
document) 

1. pwjWn ^ ngram.weight / totalNgramsWeight // totalNgramsWeight is global 

2. firstSegments ^getFirstSegments(ngram) 

3. secondSegments ^ getSecondSegments(ngram) 

4. avp ^0 

5. from i ^-0 to size of firstSegments do 

6. wpVi ^firstSegments(i) 

7. WiW„ ^ secondSegments(i) 

8. pwjWi ^precalcMatrix(wiWi)(ALL) / totalSegmentsWeight 

9. pw{w„ ^ precalcMatrix(ALL)(wiW„) / totalSegmentsWeight 

10. avp ^ avp + pwjWi * pWiW„ 

11. loop 

12. avp ^ avp / (sizeN gram - 1) // sizeNgram is global 

13. return logjo (pw ]W„ / avp ) // Fair Specific Mutual Information value for ngram 

Algorithm ribosomalTranslation (text, tRNA, sizeNgram) 

Input: the contents, text, from a document, a hash table tRNA which associates a 
significance value to an n-gram, the size of the n-grams sizeNgram 

1. i <-0 

2. candidateKey ^ XII empty string 

3. oldCandidateSignificance ^0 

4. while i < (length of text - sizeNgram) do 

5. chunk ^ substring (text, i, sizeNgram) 

6. candidateKey ^ merge(candidateKey, chunk) // Combines two strings 

7. acumSignificance C- acumSignificance -H tRNA(chunk) 

8. newCandidateSignificance C- acumSignificance / length of candidateKey 

9. if newCandidateSignificance > oldCandidateSignificance 

10. oldCandidateSignificance C- newCandidateSignificance 

11. iC-i + 1 

12. else 

13. candidateKey C- candidateKey — chunk 

14. Call addNewChunk (candidateKey, oldCandidateSignificance) 

15. candidateKey ^X 

16. oldCandidateSignificance ^0 

17. i C- undo( i, chunk) // Undo the last action to recover i value 

18. end if 

1 9. loop 

Algorithm proteinF olding( textChunks ) 

Input: the hash table textChunks that associates a significance weight to each text chunk 

1. newKeys C- self Attract( textChunks) // Obtains a list of chunks with new weights 

2. threshold ^mean of newKeys weights + n/2 * typical deviation of newKeys weights 

3. window ^ round(log 2 (size of newKeys)) 

4. newKeys C-mergeKeys(newKeys, threshold, window) // Merges similar keys 

5. threshold ^mean of newKeys weights + it/2 * typical deviation of newKeys weights 

6. newKeys C- sliceKeys (newKeys, threshold) 

7. return mergeKeys (newKeys, 0, size of newKeys) // Subset of weight keyphrases 
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Algorithm selfAttract(textChunks) 

Input: the hash table textChunks that associates a significance weight to each text chunk 

1. //Each chunk is assigned a “probability” of attraction based on its original ranking 

2. max ^ (tt/2 _ ] 

3. from i ^0 to size of textChunks do 

4. tmpPweightsi i) 7t/2 * ( ti/2 )') / max 

5. loop 

6. for each keyphrase key in textChunks do 

7. pWeights(key) C- tmpPweights(ranking(key, textChunks)) 

8. loop 

9. // Each chunk is assigned a “probability" of attraction based on its contacts weights 

10. maxContact C-0 

11. for each keyphrase keyi in textChunks do 

12. contacts C-0 

13. for each keyphrase key 2 in textChunks do 

14. if partialMatch (keyi, key 2 ) 

15. contacts C- contacts + textChunks(key 2 ) / textChunks/(key f 

16. end if 

1 7. loop 

18. if contacts > maxContact 

19. maxContact ^contacts 

20. end if 

21. pContacts(keyi) ^contacts 

22. loop 

23. for each keyphrase key in textChunks do 

24. pContacts(key) C-( maxContact — pContacts(key)) / maxContact 

25. loop 

26. // The final attraction “probability” is pWeights * pContacts 

27. for each keyphrase key in textChunks do 

28. pAttraction(key) C- pWeights(key) * pContacts(key) 

29. loop 

30. // New weights after attraction reinforcements are compute 

31. for each keyphrase keyi in textChunks do 

32. Wi ^ textChunks(keyi) 

33. for each keyphrase key 2 in textChunks do 

34. if partialMatch (keyi, k^yz) 

35. wi ^Wi + pAttraction(keyi) * pAttraction(key 2 ) * textChunks(key 2 ) 

36. end if 

37. loop 

38. newWeights(keyi) ^Wi 

39. loop 

40. return newWeights // Input keyphrases with new weights after mutual reinforcement 
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Algorithm enrichKeyphrases (keyphrases, document, maxiters) 

Input: the hash table keyphrases with weighted keyphrases, document, the contents of the 
document and the maximum number of iterations maxiters 

1. iters ^0 

2. while new keyphrases are found and iters < maxiters do 

3. for each keyphrase key in keyphrases do 

4. leftSiders ^ getLeftSideWords(key, document) 

5. rightSiders ^ getRightSideWords(key, document) 

6. for each left side word I in leftSiders do 

7. keyPairs(l)(key) ^ keyPairs(l)(key) + 1 

8. loop 

9. for each right side word r in rightSiders do 

10. keyPairs(key)(r) ^ keyPairs(key)(r) + 1 

11. loop 

12. drop any keyPairs(a)(b) with a value of 1 

13. keyphrases ^ merge(keyphrases, keypairs) 

14. iters ^ iters + 1 

15. loop 

16. loop 

1 7. return keyphrases // The hash table updated with new rich keyphrases 
Algorithm blindLight (passages, richKeyphrases, K) 

Input: the list of passages from the text passages, the hash table richKeyphrases with 
weighted refined keyphrases and the K factor to modify the summary threshold 

1. for each passage p in passages do 

2. acumWeight ^0 

3. matches ^0 

4. for each rich keyphrase key in richKeyphrases do 

5. if key is in p 

6. acumWeight ^ acumWeight + richKeyphrases(key) 

7. matches ^matches + 1 

8. end if 

9. loop 

10. highlightWeights(p) ^ matches * acumWeight / length of p 

11. loop 

12. threshold ^ K * mean of non-zero highlightWeights 

13. for each passage p in passages do 

14. if highlightWeights(p) > threshold 

15. Call highlightHTML(p) 

16. Call addPassageToSummary(p) 

1 7. end if 

18. loop 
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Abstract. In this paper, we propose a trainable selective attention model that 
can inhibit an unwanted salient area and only focus on an interesting area in a 
static natural scene. The proposed model was implemented by the bottom-up 
saliency map model in conjunction with the adaptive resonance theory (ART) 
network. The bottom-up saliency map model generates a salient area based on 
intensity, edge, color and symmetry feature maps, and human supervisor de- 
cides whether the selected salient area is important. If the selected area is not 
interesting, the ART network trains and memorizes that area, and also generates 
an inhibit signal so that the bottom-up saliency map model does not have atten- 
tion to an area with similar characteristic in subsequent visual search process. 
Computer simulation results show that the proposed model successfully gener- 
ates the plausible sequence of salient region that does not give an attention to 
an unwanted area. 



1 Introduction 

The human eye can focus on an attentive location in an input scene, and select an 
interesting object to process in the brain. Our eyes move to the selected objects very 
rapidly through the saccadic eye movement. These mechanisms are very effective in 
processing high dimensional data with great complexity. If we apply the human-like 
selective attention function to the active vision system, an efficient and intelligent 
active vision system can be developed. The selective attention is one of the most 
important features so that a humanoid robot in the next generation has an ability to 
decide an interesting object by itself for sociable learning through interaction with the 
environment or the human being. Moreover, it is also crucial for a humanoid robot to 
have an incremental learning mechanism such as incremental inhibition of unwanted 
area. In a perspective of this point, our proposed trainable selective attention model 
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can bring the core technology for the autonomous incremental intelligence that will 
be required for the next generation information technology. 

Considering the human-like selective attention function, top-down or task de- 
pendent processing can affect how to determine the saliency map as well as bottom- 
up or task independent processing [1]. In top-down manner, the human visual system 
determines salient locations through a perceptive processing such as understanding 
and recognition. It is well known that the perception mechanism is one of the most 
complex activities in our hrain. Moreover, top-down processing is so subjective that it 
is very difficult to model the processing mechanism in detail. On the other hand, with 
bottom-up processing, the human visual system determines salient locations obtained 
from features that are based on the basic information of an input image such as inten- 
sity, color and orientation, etc. [1]. Bottom-up processing can be considered as a 
function of primitive selective attention in human vision system since humans selec- 
tively attend to such a salient area according to various stimuli in input scene. 

As a previous work, Itti and Koch introduced brain-like model to generate the sa- 
liency map. Based on the Treisman’s result [2], they use three types of bases such as 
intensity, orientation and color information, to construct a saliency map in a natural 
scene [1]. Koike and Saiki proposed that a stochastic WTA enables the saliency-based 
search model to cause the variation of the relative saliency to change search effi- 
ciency, due to stochastic shifts of attention [3]. In a hierarchical selectivity mecha- 
nism, Sun and Fisher integrated visual salience from bottom-up groupings and the 
top-down attentional setting [4]. Ramstrom and Christensen calculated saliency with 
respect to a given task using a multi-scale pyramid and multiple cues. Their saliency 
computations were based on game theory concepts, specifically a coalitional game 
[5]. However, the weight values of these feature maps for constructing the saliency 
map are still determined artificially. Also, all of these models are non-interactive with 
environment, and resultantly it is insufficient to give confidence of the selected salient 
area whether the selected area is interesting. 

On the other hand, Barlow suggested that our visual cortical feature detectors 
might be the end result of a redundancy reduction process [6], in which the activation 
of each feature detectors is supposed to be as statistically independent from the other 
as possible. We suppose that the saliency map is one of the results of redundancy 
reduction of our brain. The scan path that is a sequence of salient locations may be 
the result of the roles of our brain for information maximization. In the Sejnowski’s 
result by using independent component analysis (ICA), the redundancy reduction of a 
natural scene derives the edge filter [7]. Buchsbaum and Gottschalk found opponent 
coding to be most efficient way to encode human photoreceptor signals [8]. Wachtler 
and Lee used ICA for hyperspectral color image and got color opponent basis from 
analysis of trichromatic image patches [9]. 

It is well known that our retina has preprocessing such as cone opponent coding 
and edge detection [10], and the extracted information is delivered to the visual cor- 
tex through lateral geniculate nucleus (LGN). Symmetrical information is also im- 
portant feature to determine the salient object, which is related with the function of 
LGN. Our developed bottom-up saliency map model considers the preprocessing 
mechanism of cells in retina and the LGN with on-set and off-surround mechanism 
before the redundancy reduction in the visual cortex. Saliency map resulted by inte- 
gration of the feature maps is finally constructed by applying the ICA that is the best 
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way for redundancy reduction. Using the bottom-up saliency map model, we can 
obtain a sequence of salient areas. 

However, the bottom-up saliency map model may select unwanted area because it 
just generates the salient areas based on the primitive features such as intensity, edge, 
color and symmetry. On the other hand, human being can learn and memorize the 
characteristics of the unwanted area, and also inhibits attention to that area in subse- 
quent visual search. In this paper, we propose a new selective attention model to 
mimic such a human-like selective attention mechanism not only with truly bottom- 
up process but also with interactive process to skip an unwanted area in subsequent 
visual search process. In order to implement the trainable selective attention model, 
we use the bottom-up saliency map model in conjunction with the adaptive resonant 
theory (ART) network. It is well known that the ART model maintain the plasticity 
required to learn new patterns, while preventing the modification of patterns that have 
been learned previously [11]. Thus, the characteristics of unwanted salient area se- 
lected by the bottom-up saliency map model is used as an input data of ART model 
that is to learn and generalize a feature of unwanted area in nature scene. In training 
process, the ART network learns about uninteresting areas that are decided by human 
supervisor interactively, which is different from the conventional ART network. In 
testing mode, the vigilance parameter in the ART network determines whether the 
new input area is interesting, because the ART network memorizes the characteristics 
of the unwanted salient areas. If the vigilance value is larger than a threshold, the 
ART network inhibits the selected area in the bottom-up saliency map model so that 
the area should be ignored in the subsequent visual search process. 

Section 2 describes biological background of the proposed model. In Section 3, 
we explain our developed bottom-up saliency map model. Section 4 shows the pro- 
posed trainable selective attention model using ART network. Simulation results and 
conclusion will be followed. 



2 Biological Background 

In the vertebrate retina, three types of cells are important processing elements for 
performing the edge extraction. Those are photoreceptors, horizontal and bipolar 
cells, respectively [12, 13]. According to these well-known facts, the edge informa- 
tion is obtained by the role of cells in visual receptor, and it would be delivered to the 
visual cortex through the LGN and the ganglion cells. The horizontal cell spatially 
smoothes the transformed optical signal, while the bipolar cell yields the differential 
signal, which is the difference between optical signal and the smoothed signal. By the 
output signal of the bipolar cell, the edge signal is detected. 

On the other hand, a neural circuit in the retina creates opponent cells from the 
signals generated by the three types of cone receptors [10]. R-tG- cell receives in- 
hibitory input from the M cone and excitatory input from the L cone. The opponent 
response of the Rh-G- cell occurs because of the opposing inputs from the M and L 
cones. The Bh-Y- cell receives inhibitory input by adding the inputs from the M and L 
cones and excitatory input from the S cone. Those preprocessed signal transmitted to 
the LGN through the ganglion cell, and the on-set and off-surround mechanism of the 
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LGN and the visual cortex intensifies the phenomena of opponency [10]. Moreover, 
the LGN plays a role of detecting a shape and pattern of an object [10]. In general, the 
shape or pattern of an object has symmetrical information, and resultantly the sym- 
metrical information is one of important features for constructing a saliency map. 
Even though the role of visual cortex for finding a salient region is important, it is 
very difficult to model the detail function of the visual cortex. Owing to the Barlow’s 
hypothesis, we simply consider the roles of the visual cortex as redundancy reduction. 



3 Bottom-Up Saliency Map Model 

The photoreceptor in the retina transforms an optical signal into an electrical signal. 
The transformed signals for the static image such as edge, intensity and color oppo- 
nent information are transmitted to the visual cortex through the LGN. A Sobel op- 
erator was used to implement the edge extraction of the retina cell. In order to imple- 
ment the color opponent coding, four broadly-tuned color channels were created: R = 
r - (g H- b)/2 for red, G = g - (r H- b)/2 for green, B = b - (r H- g)/2 for blue, and Y = (r -i- 
g)/2 - |r - g|/2 - b for yellow where r, b and g denote red, blue and green pixel values, 
respectively. RG and BY color opponent coding was obtained by considering the on- 
center and off-surround mechanism. Additionally, we used the noise tolerant gener- 
alized symmetry transform (NTGST) algorithm to extract symmetrical information 
from the edge information [13]. 




I: intensity feature, E: edge feature, Sym: symmetry feature, RG : red-green opponent coding 
feature map, BY : blue-yellow opponent coding feature, CSD & N : center- surround difference 
and normalization, / : intensity feature map, E : edge feature map, Sym '■ s)Tnmetry feature 

map, C : color feature map, ICA : independent component analysis, SM : saliency map , Max : 
max operator 



Fig. 1. Bottom-up saliency map model 



Fig. 1 shows a proposed saliency map model based on biological visual process. 
The extracted visual information such as edge, intensity, and color opponency is 
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transmitted to the visual cortex through the LGN in which symmetrical information 
can be extracted by edge information. That extracted information is preprocessed to 
mimic the biological mechanism. In the course of preprocessing, we used a Gaussian 
pyramid with different scales from 0 to n level, in which each level is made by sub- 
sampling of 2”, thus constructing five feature basis such as I(-), E(-), Sym(j, RG(j, 
and BY( ■). It is to reflect the non-uniform distribution of retinotopic structure. Then, 
the center-surround mechanism is implemented in the model as the difference be- 
tween the fine and coarse scales of Gaussian pyramid images [1]. Consequently, five 
center-surround feature basis such as I(C,S), E(C,S), Sym(c,s), RG(C,S), and BY(C,S) 
are obtained by the following equations. 



I(c, s) = 11(c) -I(s)l 


( 1 ) 


E(c,s)=lE(c)-E(s)l 


( 2 ) 


Sym(c,s)=lSym(c) - Sym(s)l 


( 3 ) 


RG(c,s)=lR(c) - G(c)j - lG(s) - R(s)/ 


( 4 ) 


BY(c,s)-lB(c) - Y(c)l - /Y(s) - B(s)/ 


( 5 ) 



where represents interpolation to the finer scale and point-by-point subtraction. 
Totally, 30 feature bases are computed because the five center- surround feature basis 
individually have 6 different scales [1]. The 30 feature bases with variant scale infor- 
mation are combined into four feature maps as shown in Eq. (6) where I , E , Sym 
and C stand for intensity, edge, symmetry, and color opponency, respectively. These 
are obtained through across-scale addition “©” [1]. 

_ 4 c+4 4 c+4 

/=© © N{I{c,s)), E = ® ®N{E{c,s)), 

c=2 5 =c+3 c=2 5=c+3 

4 c+4 4 c+4 

Sym=® © N{S{c,s)) , C = ® © N{RG{c,s) + BY{c,s)) ( 6 ) 

c=2 5 =c+3 c=1s=c+2> 



In this paper, we use unsupervised learning to determine the relative importance 
of different bases used to generate a suitable salient region using ICA. Even though it 
is difficult to understand the mechanisms of the human brain including the visual 
cortex to process the complex natural scene, Barlow’s hypothesis might be useful in 
explaining the role of the human brain. 

We suppose that eye movements in bottom-up processing are the result of our 
brain activity for maximizing visual information, and that the eye sequence that we 
focus on an object can be modeled by redundancy reduction of the visual information. 
In order to model the saliency map, we use ICA because the ICA algorithm is the best 
way to reduce redundancy [9]. The ICA algorithm is able to separate the original 
independent signals from the mixed signal by learning the weights of the neural net- 
work used to maximize the entropy or log-likelihood of output signals. Additionally, 
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the ICA algorithm can extract important features to minimize the redundancy or mu- 
tual information between output signals [14]. The purpose of ICA is to seek mutually 
independent components, and it leads to a local representation quite similar to that 
obtained through sparse coding [15]. 

We consider intensity, edge, and color opponent coding in the retina and symme- 
try in the LGN, and use these results for input patches of ICA. Fig. 3 shows the pro- 
cedure of realizing the saliency map from four feature maps, I , E , Sym and C ■ In 
Fig. 3, £„ is obtained by the convolution between the r-th channel of the feature 
maps(FMr ) and the i-th filters(/Cj„ ) obtained by ICA learning as shown in Eq. (8). 

E^,=FM^^ICs„ for i = r = 1,..,4 (8) 

where N denotes the number of filters. Convoluted feature map, represents the 
influences of the four feature maps have on each independent component. Finally, a 
saliency map is obtained using Eq. (9). 

S{x,y) = 'YjEfx,y) for all i (9) 

The saliency map S(x,y) is computed by summation of all feature maps for every 
location (x, y) in an input image. A salient location P is the maximum summation 
value in a specific window of a saliency map, as shown in Eq. (10). 



P - arg max 

(x,y) 



z 

{U,V)€.W 



S (u,v) 



for 



all 




( 10 ) 



where (u,v) is a window with 20X20 size. The selected salient location P is the most 
salient location of an input image. 



FMi=^FM2= 
FM 3 = Sym FM 4 = C 




Fig. 2. Realizing of the saliency map model from feature maps (7 : intensity feature map, E ; 
edge feature map, Sym '■ symmetry feature map, C '■ color feature map, SM : saliency map) 
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4 Trainable Selective Attention Model 



Although the proposed bottom-up saliency map model generates plausible salient 
areas, the selected areas may not be an interesting area for human because the sali- 
ency map only uses the primitive features such as intensity, edge, color and symmetry 
information. In order to implement more plausible selective attention model, we need 
an interactive procedure together with the bottom-up information processing. 

Human ignores uninteresting area even if it has salient primitive features, and can 
memorize the characteristics of the unwanted area. We do not give an attention to a 
new area with similar characteristics of the previously learned unwanted area. We 
propose a new selective attention model to mimic such a human-like selective atten- 
tion mechanism considering not only the primitive input features but also interactive 
property with environment. Moreover, human brain can learn and memorize many 
new things without catastrophic forgetting of existing ones. It is well known that the 
ART network can be easily trained for additional input pattern and also can solve the 
stability-plasticity dilemma in another neural network models. Therefore, we use the 
ART network together with the bottom-up saliency map model to implement a train- 
able selective attention model that can interact with human supervisor. During the 
training process, the ART network learns and memorizes the characteristics of the 
uninteresting areas selected by the bottom-up saliency map model. The uninteresting 
areas are decided by human supervisor. After successful training of the ART network, 
an unwanted salient area is inhibited by the vigilance value of ART network. 

Fig. 3 shows the architecture of the trainable attention model during training proc- 
ess. 




Fig. 3. The architecture of training mode of the proposed trainable selective attention model 
( / : intensity feature map, E : edge feature map, Sym '■ symmetry feature map, C '■ color fea- 
ture map, ICA : independent component analysis, SM : saliency map). Square blocks 1 and 3 in 
the SM are interesting areas, but block 2 is uninteresting area 
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In Fig. 3 the attention area obtained from saliency map inputs to the ART model, 
and then a supervisor decides whether it is really salient area or unwanted area. If the 
selected area is unwanted area even though it has salient features, the ART model 
trains and memorize that area. But, if the selected area is an interesting area, then it is 
not involved in training of the ART network. In the proposed model, we use ART 1 
model. Thus, all the inputs of the ART network are transformed to binary vectors. 

Fig. 4 shows the architecture of the proposed model during testing mode. After 
training process of the unwanted salient areas is successfully finished, the ART net- 
work memorizes the characteristics of unwanted areas. If a salient area selected by the 
bottom-up saliency map model of a test image has similar characteristics of ART 
memory, it should be ignored by inhibiting of the saliency map model. In the pro- 
posed model, the vigilance value in the ART model is used as a decision parameter 
whether the selected area is interesting or not. When an unwanted salient area inputs 
to the ART model, the vigilance value is higher than a threshold, which means that it 
has similar characteristics with the trained unwanted areas. Therefore, the ART model 
inhibited those unwanted salient areas not so as to give an attention to them. In con- 
trast, when an interesting salient area inputs to the ART model, the vigilance value 
becomes lower than a threshold, which means that such an area is not trained and it is 
interesting attention area. As a result, the proposed model can focus on a desired 
attention area, but it does not focus on a salient area with unwanted feature. 




Fig. 4. The architecture of testing mode of the proposed trainable selective attention model 
using ART network ( / : intensity feature map, E : edge feature map, Sym ’■ symmetry feature 

map, C ; color feature map, ICA : independent component analysis, SM : saliency map). 
Square blocks 1 and 3 in the SM are interesting areas, but block 2 is trained uninteresting area 
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5 Computer Simulation and Results 



In our simulation, we used pre-processed four channel images of color natural scenes. 
Sobel operator with gray level was used to implement the edge extraction of our ret- 
ina cell [14]. In order to implement the color opponent coding, four broadly-tuned 
color channels are created: R = r - (g H- b)/2 for red, G = g - (r H- b)/2 for green, B = b - 
(r + g)/2 for blue, and Y = (r H- g)/2 - |r - g|/2 - b for yellow (negative values are set to 
zero) where r, b and g denote red, blue and green pixel values, respectively. RG and 
BY color opponent coding was obtained by considering the on-center and off- 
surround mechanism of the LGN as shown in Eq. (4) and (5). Also, we used the noise 
tolerant generalized symmetry transform (NTGST) algorithm to extract symmetrical 
information from edge information [16]. 
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Fig. 5. Procedure of realizing the ICA filter 



In our simulation, to obtain ICA filters, we derived the four feature maps {I , E , 
Sym and C ) that are used for input patches of the ICA. Fig. 5 shows the procedure of 

realizing the ICA filter. We used randomly selected 20,000 patches of 7 ^7^ 4 pixels 
from four feature maps. Each sample consists of a column in the input matrix of 
which the rows and columns are 196 and 20,000, respectively. The basis functions are 
determined using the extended infomax algorithm [15]. The learned basis functions 
have a dimension of 196^ 196. Each row of the basis functions represents a filter and 
that is ordered according to the length of the filter vector. 

We applied the ICA filters to the four channel images, or the four feature maps of 
color natural scenes to obtain salient point [7]. We computed the most salient point P, 
as shown in Eq. (10). Then an appropriate focus area centered by the most salient 
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location was masked off, and the next salient location in the input image was calcu- 
lated using the saliency map model. This indicates that previously selected salient 
location is not considered duplicate. 

Fig. 6 shows the experimental results of the proposed saliency map model. In or- 
der to show the effectiveness of the proposed saliency map model, we used the simple 
images such as motorcycle and traffic signal images that can show a unique salient 
object. In Figs. 6 (b) and (d), the bright region in the saliency map (SM) can be con- 
sidered as the most salient region. As shown in these figures, the proposed method 
successfully generates plausible salient locations such as yellow motorcycle and red 
traffic signal in a natural scene. 




(c) Traffic sign (d) SM of traffic sign 

Fig. 6. Experimental results of saliency map model of simple natural images 



Fig. 7 shows the experimental results of a complex natural image. The preproc- 
essed feature maps (I , E , Sym and C ) from color image are convolved by the ICA 
filters to construct a saliency map. At first, we compute the most salient region by Eq. 
(10). Then an appropriate focus area centered by the most salient location is masked 
off, and the next salient location in the input image is calculated using the salient map 
model. It means that previously selected salient location is not considered duplicate. 
Fig. 7 also shows the successive salient regions. 
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Fig. 7. Experimental result of proposed saliency map model of a natural image; generated four 
feature maps, saliency map and successive salient regions 

Fig. 8 shows the scan path examples generated by the proposed saliency map 
model of various natural images. Figs. 8 (a) and (d) are the input images such as car, 
flower and street images. Figs. 8 (b) and (e) show the saliency maps (SM) of each 
input image, and Figs. 8 (c) and (f) are the generated scan paths of each input image. 
As shown in Fig. 8, the proposed model successfully generates the human-like scan 
paths of complex natural images. 




(a) Flower image (b) SM of flower image (c) Scan paths of flower image 




(d) Street image (e) SM of street image (f) Scan paths of street image 



Fig. 8. Scan path examples generated by the proposed saliency map model for various images 
with complex background 
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Fig. 9 shows the simulation process of the proposed trainable selective attention 
model during training process. The numbers in the Fig. 9 represents the sequence of 
scan path according to the degree of saliency. In Fig. 9, the 5* salient area is decided 
as unwanted salient region by human supervisor. In the ART network, learning takes 
place about the uninteresting 5* salient area through the modification of the weights, 
or long term memory (LTM) traces. Other four interesting salient areas are not trained 
in the ART network. Fig. 10 shows the simulation results of the proposed trainable 
selective attention model during testing mode after training process is successfully 
finished. In Fig. 10, 4, and 4^ represent input and output attention sequence at k-th 
salient area, respectively. The 5* selected salient area shows high vigilance value 
because it is already trained and memorized in the weights of the ART network as 
shown in Fig. 9. Therefore, Fig. 10 shows that the uninteresting salient area at 4^ is 
not excited as an attention area even though it is selected as one of salient areas by the 
bottom-up saliency map model. 



Training 




Saliency map input image 



Fig. 9. Simulation example of training process for the proposed trainable selective attention 
model 
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Saliency map 



Input image 



Fig. 10. Simulation example of testing mode for the proposed trainable selective attention 
model after training is successfully finished 



Fig. 1 1 shows the simulation results for several different test images that are not 
used in training process. The proposed trainable selective attention model shows 
reasonable performance for a lot of natural scenes. In Fig. 11, the first column images 
are input natural scene, the second column shows the selected attention areas using 
only the bottom-up saliency map model, and the last column shows the selected at- 
tention areas using the proposed trainable selective attention model. The results in 
Fig. 1 1 show that our proposed model can focus on more reasonable salient areas like 
human visual system. 

Fig. 12 shows the simulation results for the repeatability of the system output by 
applying it on several frames from a video sequence. Figs. 12 (a) and (b) show the 
selective attention for the first frame and the second frame, respectively. In Fig. 12 
(a), the supervisor decided an inhibition area as the 5* attention area that is shown by 
white box. In the second frame, the inhibited area at the first frame was not selected 
as an attention area. Also, the supervisor inhibited the two areas marked as number 2 
and 3 in the second frame, which are also shown by white boxes in Fig. 12 (b). In 10* 
frame, those two inhibited areas inhibited at the second frame were not selected as 
attention areas as shown in Fig. 12 (c). In Fig. 12 (c), the supervisor decided an inhi- 
bition area as the 5* attention area. Fig. 12 (d) is the 11* frame after changing the 
focus to the most salient area selected in Fig. 12 (c). The selective attention results as 
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shown in Fig. 12 (d) shows that the inhibited areas in Figs, (h) and (c) are not selected 
as attention areas. The experimental results in Fig. 12 show that our proposed model 
can work well for a video sequence. 






Fig. 11. Comparison of the simulation results in both cases that one is to use only the bottom- 
up saliency map model and the other is to use the ART network with the hottom-up saliency 
map model 
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(c)10“’ frame (d) 11* frame 

Fig. 12. The simulation results for the video sequence 



6 Conclusion 

We proposed a trainable selective attention model that can inhibit an unwanted salient 
area and only focus on an interesting area in a static natural scene. The proposed 
model was implemented using an adaptive resonance theory (ART) network in con- 
junction with a biologically motivated bottom-up saliency map model. Computer 
simulation results show that the proposed method gives a reasonable salient region 
and scan path that does not give an attention to unwanted area. Our proposed train- 
able selective attention model can autonomously focus on a salient area in the natural 
scene and also inhibit to pay an attention on an unwanted area or an object through 
incremental learning, which can play an important role for autonomous incremental 
intelligence that will be required for better humanoid system. As a further work, a 
selective attention model with excitation mechanism for desired one as well as en- 
hanced inhibition mechanism is under investigation. 
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Abstract. According to the No Free Lunch (NFL) theorems all black- 
box algorithms perform equally well when compared over the entire set of 
optimization problems. An important problem related to NFL is finding 
a test problem for which a given algorithm is better than another given 
algorithm. Of high interest is finding a function for which Random Search 
is better than another standard evolutionary algorithm. In this paper we 
propose an evolutionary approach for solving this problem: we will evolve 
test functions for which a given algorithm A is better than another given 
algorithm B. Two ways for representing the evolved functions are em- 
ployed: as GP trees and as binary strings. Several numerical experiments 
involving NFL-style Evolutionary Algorithms for function optimization 
are performed. The results show the effectiveness of the proposed ap- 
proach. Several test functions for which Random Search performs better 
than all other considered algorithms have been evolved. 



1 Introduction 

Since the advent of the No Free Lunch (NFL) theorems in 1995 [17,18], the 
trend of Evolutionary Computation (EC) [9] have not changed at all, although 
these breakthrough theories should have produced dramatic changes. Most 
researchers chose to ignore NFL theorems: they developed new algorithms that 
work better than the old ones on some particular test problems. The researchers 
have eventually added: 

’’The algorithm A performs better than another algorithm on the considered 
test functions” . 

That is somehow useless since the proposed algorithms cannot be the best 
on all the considered test functions. 

Moreover, most of the functions employed for testing algorithms are artifi- 
cially constructed. 

Consider for instance, the field of evolutionary single-criteria optimization 
where most of the algorithms were tested and compared on some artificially 
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constructed test functions (most of them being known as De’Jong test problems) 
[9,14]. These test problems were used for comparison purposes before the birth 
of the NFL theorems and they are used even today (9 years later after the birth 
of the NFL theorems) . Evolutionary multi-criteria optimization was treated in a 
similar manner: most of the recent algorithms in this field were tested on several 
artificially constructed test functions proposed by K. Deb in [3]. 

Roughly speaking, the NFL theorems state that all black-box optimization 
algorithms perform equally well over the entire set of optimization problems. 
Thus, if an algorithm A is better than another algorithm B on some classes of 
functions, the algorithm B is better than A on the rest of the functions. 

As a consequence of the NFL theories, even a computer program (imple- 
menting an Evolutionary Algorithm (EA)) containing programming errors can 
perform better than some other highly tuned algorithms for some test functions. 

Random Search (RS) being a black box search / optimization algorithm 
should perform better than all of the other algorithms for some classes of test 
functions. Even if this statement is true, there is no result reported ~ in the 
literature - of a test function for which RS performs better than all the other 
algorithms (taking into account the NFL restriction concerning the number of 
distinct solutions visited during the search). However, in [4] is presented a func- 
tion which is hard for all Evolutionary Algorithms. 

Instead, a lot effort is spent for proving that the No Free Lunch theorems 
are not true. Most researchers [4,5,12,15,16] have tried to find some classes of 
problems for which NFL does not hold. For instance, in [5] is shown that NFL 
might not hold for small problems (that have a small search space). 

Three questions (on how we match problems to algorithms) are of high in- 
terest: 



— For a given class of problems, what is (are) the algorithm(s) that performs 
(perform) better than all other algorithms? 

— For a given algorithm what is (are) the class(es) of problems for which the 
algorithm performs best? 

— Given two algorithms A and B, what is (are) the class (es) of problems for 
which A performs better than B7 

Answering these questions is not an easy task. All these problems are still 
open questions and they probably lie in the class of the NP-Complete problems. If 
this assumption is true it means that we do not know if we are able to construct 
a polynomial algorithm that takes a function as input and outputs the best 
optimization algorithm for that function (and vice versa). Fortunately, we can 
try to develop a heuristic algorithm able to handle this problem. 

In this paper we develop a framework for constructing test functions that 
match a given algorithm. More specific, given two algorithms A and B, the 
question is: 

What the functions for which A performs better than B (and vice-versa) are? 
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For obtaining such functions we will use an evolutionary approach: the func- 
tions matched to a given algorithm are evolved by using an standard evolutionary 
algorithms 

Of high interest is finding a test function for which Random Search performs 
better than all considered standard evolutionary algorithms. Using the proposed 
approach we were able to evolve such test problems 

The paper is organized as follows: The NFL algorithm is minutely described 
in section 2. Test functions represented as GP trees are evolved in section 3. 
The fitness assignment process is described in section 3.1. The algorithms used 
for comparison are described in section 3.2. Several numerical experiments are 
carried out in section 3.3. Test functions represented as binary strings are evolved 
in section 4. The fitness assignment process is described in section 4.2. The 
algorithms used for comparison are described in section 4.3. Several numerical 
experiments are carried out in section 4.4. 



2 A NFL-Style Algorithm 

We define a black-box optimization algorithm as indicated by Wolpert and 
McReady in [17,18]. 

The evolutionary model (the NFL-style algorithm) employed in this paper 
uses of a population consisting of a single individual. This considerably simplifies 
the description and the implementation of a NFL-style algorithm. 

No archive for storing the best solutions found so far (see for instance Pareto 
Archived Evolution Strategy [11]) is maintained. However, we implicitly maintain 
an archive containing all the distinct solutions explored until the current state. 
We do so because only the number of distinct solutions is counted in the NFL 
theories. This kind of archive is also employed by Tabu Search [7,8]. 

The variables and the parameters used by a NFL-style algorithm are given 
in Table 1. 

The algorithm starts with a randomly generated solution (the current solu- 
tion) over the search space. This solution is added to the archive. The following 
steps are executed until MAXSTEPS different solutions are explored: Gener- 
ate a solution in the neighborhood of the current solution. This new solution is 
usually obtained by mutating the current solution. We have to ensure that the 
newly generated solution is different from all previously explored solutions (The 
algorithm that generates a solution different from all other solutions explored so 
far is given further in this section) . We add the generated solution to the archive 
and it becomes the current solution which will be further explored. 

The NFL-style algorithm is the following: 



^ Source code used for evolviug test function is available at www.nfl.cs.ubbcluj.ro 
^ Test functions for which Random Search performs better than other standard Evo- 
lutionary Algorithms are available at www.nfl.cs.ubbcluj.ro 
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Table 1. The variables used by the NFL algorithm. 



Variable 


Meaning 


Archive 


the archive storing all distinct solutions visited by 
algorithm 


curr_sol 


the current solution (point in the search space) 


new-sol 


a new solution (obtained either by mutation or by 
initialization) 


MAXSTEPS 


the number of generations (the number of distinct 
points in the search space visited by the algo- 
rithm). 


t 


the number of distinct solutions explored so far 



NFL-style Algorithm 



Si- Archive = 0 ; 

52. Randomly initializes the current solution (curr_sol) 

// add the current solution to the archive 

53. Archive = Archive + {curr_sol}; 

54. t = 1; 

55. while t < MAXSTEPS do 

Sq. Select a new solution (new_sol) in 

the neighborhood of the curr_sol 

S7. Archive = Archive + {new_sol}; 

Sg. curr_sol = new_sol; 

Sg. t = t + 1; 

SiQ. endwhile 

An important issue concerning the NFL algorithm described above is related 
to the step Sq which selects a new solution that does not belong to the Archive. 
This is usually done by mutating the current solution and keeping the offspring if 
the latter does not already belong to the Archive (The actual acceptance mech- 
anism is minutely described in sections 3.2 and 4.3). If the offspring belongs to 
the Archive for a fixed number of mutations (steps) it means that the neigh- 
borhood of the current solutions could be exhausted (completely explored). In 
this case, a new random solution is generated and the search process moves to 
another region of the search space. It is sometimes possible that the generated 
solution to already belong to the Archive. In this case, another random solution 
is generated over the search space. We assume that the search space is large 
enough and after a finite number of re-initializations the generated solution will 
not belong to the Archive. 

The algorithm for selecting a new solution which does not belong to the 
Archive (the step Sq) is given below: 
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SSi- nr_mut = Q', j j the number of mutations is set to 0 

552. Repeat 

55 3 . textsf new_sol = Mutate (curr_sol); 

55 4 . nr_mut = nr_mut + 1; 

55 5 . until (nr_mut = MAX_MUTATIONS) and (new_sol ^Archive) and Ac- 
cepted(new_sol); 

SSe- while new_sol ^Archive do 

SS7, /n/t/a//ze(new_sol); //we jump in another randomly chosen point of 

the search space 
SSs- endwhile 



3 Real- Valued Functions 

Test functions represented as GP trees are evolved in this section. 

3.1 Evolutionary Model and the Fitness Assignment Process 

Our aim is to find a test function for which a given algorithm A performs better 
than another given algorithm B. The test function that is being searched for will 
be evolved by using Genetic Programming [10] with steady state [13]. 

The quality of the test function encoded in a GP chromosome is computed 
in a standard manner. The given algorithms A and B are applied to the test 
function. These algorithms will try to optimize (find the minimal value of) that 
test function. To avoid the lucky guesses of the optimal point, each algorithm is 
run 500 times and the results are averaged. Then, the fitness of a GP chromosome 
is computed as the difference between the averaged results of the algorithm A 
and the averaged results of the algorithm B. In the case of function minimization, 
a negative fitness of a GP chromosome means that the algorithm A performs 
better than the algorithm B (the values obtained by A are smaller (on average) 
than those obtained by B). 

3.2 Algorithms Used for Comparison 

We describe several evolutionary algorithms used for comparison purposes. 
All the algorithms described in this section are embedded in the NFL-style 
algorithm described in section 2. More precisely, the considered algorithms 
particularize the solution representation, the mutation operator, and the accep- 
tance mechanism (the procedure Accepted) of the NFL algorithm described in 
section 2. The mutation operator is the only search operator used for exploring 
the neighborhood of a point in the search space. 

Ai ~ real encoding (the individuals are represented as real numbers using 32 
bits), Gaussian mutation with cti = 0.001, the parent and the offspring compete 
for survival. 
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A 2 ~ real encoding (the individuals are represented as real numbers using 32 
bits), Gaussian mutation with a 2 = 0.01, the parent and the offspring compete 
for survival. 

A 3 ~ binary encoding (the individuals are represented as binary strings of 
32 bits), point mutation with pm = 0.3, the parent and the offspring compete 
for survival. 

Ai - binary encoding (the individuals are represented as binary strings of 32 
bits), point mutation with pm = 0.1, the parent and the offspring compete for 
survival. 

3.3 Numerical Experiments 

Several numerical experiments for evolving functions matched to a given algo- 
rithm are performed in this section. The algorithms used for comparison have 
been described in section 3.2. 

The number of dimensions of the space is set to 1 (i.e. one-dimensional func- 
tions) and the definition domain of the evolved test functions is [0, 1]. 

The parameters of the GP algorithm are given in Table 2. 



Table 2. The parameters of the GP algorithm used for numerical experiments 



Parameter 


Value 


Population size 


50 


Number of generations 


10 


Maximal GP tree depth 


6 


Function set 


F = {-b, -, *, sin, exp} 


Terminal set 


T = {x} 


Grossover probability 


0.9 


Mntation 


1 mutation / chromosome 


Rnns 


30 



The small number of generations (only 10) has been proved to be sufficient 
for the experiments performed in this paper. 

Evolved functions are given in Table 3. For each pair {Ak, Aj) is given the 
evolved test function for which the algorithm Ak performs better than the algo- 
rithm Aj . The mean of the fitness of the best GP individual over 30 runs is also 
reported. 

From Table 3 it can be seen that the proposed approach made possible the 
evolving of test functions matched to the most of the given algorithms. The 
results of these experiments give a first impression of how difficult the problems 
are. Several interesting observations can be made: 
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Table 3. The evolved test functions 



Algorithms 


Evolved Test Function 


Averaged fitness 


(Ai, A 2 ) 


fi{x) = 0. 


0 


(A 2 , Ai) 


f2ix) = —6x^ — X. 


-806.03 


(As, A 4 ) 


fslx) = x- 2x®. 


-58.22 


(A 4 , As) 


fiix) = -4a;®. 


-34.09 


(A 2 , A 4 ) 


fsix) = 0. 


0 


(A 4 , A 2 ) 


feix) = — 6 a;® — x. 


-1601.36 



The GP algorithm was able to evolve a function for which the algorithm A 2 
(real encoding with a = 0.01) was better then the algorithm Ai (real encoding 
with CT = 0.001) in all the runs (30). However, the GP algorithm was not able 
to evolve a test function for which the algorithm Ai is better that the algorithm 
A 2 - In this case the function f{x) = 0 (where both algorithms perform the same) 
was the only one to be found. It seems to be easier to find a function for which an 
algorithm with larger ’’jumps” is better than an algorithm with smaller ’’jumps” 
than to find a function for which an algorithm with smaller ’’jumps” is better 
than an algorithm with larger ’’jumps” . 

A test function for which the algorithm A4 (binary encoding) is better than 
the algorithm A 2 (real encoding) was easy to find. The reverse (i.e. a test function 
for which the real encoding algorithm A 2 is better than the binary encoded 
algorithm A 4 ) has not been found by using the GP parameters considered in 
Table 2. 



4 Binary-Valued Functions 

Test functions represented as binary strings are evolved in this section. We em- 
ployed the binary-strings representation for the test functions because in this way 
we can evolve any function without being restricted to a given set of operators. 



4.1 Prerequisite 

Our analysis is performed in the finite search space X [17]. The space of possible 
’’cost” values, Y, is also finite. We restrict our analysis to binary search spaces. 
This is not a hard restriction since all other values can be represented as binary 
strings. Thus X = {0, 1}" and X = {0, 1}'". 

An optimization problem / is represented as a mapping f : X 1 -^ Y. 

The set F = Y^ denotes the space of all possible problems. The size of F is 

In our experiments n = 16 and m = 8 . Thus |A| = 2^® = 65536 
and |F| = 2® = 256. The number of optimization problems in this class is 

|y^||X| _ 25065536 ^ j^q157826 
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Each test problem in this class can be stored in a string of 65536 * 256bits = 
2048Kb = 2Mb. 

Within this huge search space we will try to find test problems for which a 
given algorithm A is better than another given algorithm B. 

4.2 Evolutionary Model and the Fitness Assignment Process 

Our aim is to find a test function for which a given algorithm A performs better 
than another given algorithm B. The test function that is being searched for will 
be represented as strings over the {0,1} alphabet as described in section 4.1. 

The algorithm used for evolving these functions is a standard steady state 
[13] Evolutionary Algorithm that works with a binary encoding of individuals [1, 
2] . Each test problem in this class can be stored in a string of 65536 * 256bits = 
2048Kb = 2Mb. 

The most important aspect of this algorithm regards the way in which the 
fitness of an individual is computed. 

The quality of the test function encoded in a chromosome is computed as 
follows: The given algorithms A and B are applied to the test function. These 
algorithms will try to optimize (find the minimal value of) that test function. 
To avoid the lucky guesses of the optimal point, each algorithm is run 100 times 
and the results are averaged. Then, the fitness of a chromosome encoding a 
test function is computed as the difference between the averaged results of the 
algorithm A and the averaged results of the algorithm B. 

In the case of function minimization, a negative fitness of a chromosome 
means that the algorithm A performs better than the algorithm B (the values 
obtained by A are smaller (on average) than those obtained by B). 

4.3 Algorithms Used for Comparison 

We describe several evolutionary algorithms used for comparison purposes. All 
the algorithms described in this section are embedded in the NFL algorithm 
described in section 2. More precisely, the considered algorithms particularize the 
solution representation, the mutation operator, and the acceptance mechanism 
(the procedure Accepted) of the NFL algorithm described in section 2. 

All algorithms are derived from (1+1) ES and can be described as follows: 

(i) Individuals are represented as binary strings over the search space X. 

(ii) Mutation operator is the only search operator used for exploring the neigh- 
borhood of a point in the search space. 

(in) The parent and the offspring compete for survival. 

The number of mutations / chromosome is a parameter of the compared 
algorithms. The range for this parameter is 1 up to chromosome length. If the 
number of mutations / chromosome is equal to the chromosome length, the 
considered algorithm will behave like Random Search. 

Since the number of mutations / chromosome is different from algorithm to 
algorithm we denote by Bk the NFL-style Algorithm that performs k mutations 
/ chromosome. 
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4.4 Numerical Experiments 

Several numerical experiments for evolving functions matched to a given algo- 
rithm are performed in this section. The algorithms used for comparison have 
been described in section 4.3. 

The parameters of the algorithm used for evolving test functions are given 
in Table 4. 



Table 4. The parameters of the algorithm used for evolving test functions 



Parameter 


Value 


Population size 


10 


Number of generations 


10 


Crossover type 


Uniform 


Crossover probability 


0.9 


Mutation type 


Point mutation 


Mutation probability 


0.01 


Chromosome length 


65536*256 bits 


Runs 


30 



The small number of generations (only 10) has been proved to be sufficient 
for the experiments performed in this paper. 

Results are given in Table 5. For each pair {Bk, Bj) is given the average (over 
30 runs) of best fitness scored by an individual (encoding a test function) for 
which the algorithm performs better than the algorithm Bj. 

Table 5 shows that the proposed approach made possible the evolving of test 
functions matched to all given algorithms (all fitness values are negative). The 
results of these experiments give a first impression of how difficult the problems 
are. Several interesting observations can be made: 

In the first row of data (corresponding to the algorithm B\) the average 
fitness decrease from -95 (for the pair (Ri, B 2 )) to -284 (for the pair (Bi,Biq)). 
Knowing that a negative value of the fitness means that the algorithm B\ is 
better than the algorithm B^ we may infer that is more easy to find a function 
for which an algorithm performing 1 mutation/chromosome is better than an 
algorithm performing 16 mutations/chromosome (the algorithm Biq which is 
actually behaves like Random Search) than to find a test function for which 
an algorithm performing 1 mutation/chromosome is better than an algorithm 
performing 2 mutations/chromosome. 

If we take a look at each row of data after the cells in the first diagonal we 
can see that the values have an descending tendency. This means that is easier 
to beat an algorithm performing more mutations than to beat an algorithm 
performing less mutations / chromosome. 

In the first column the values have a descending trend, too. This means that 
in the space of test functions it is more easy to find a test function for which an 
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Table 5. Fitness of the best individual in the last generation. Results are averaged 
over 30 independent runs. 





Bi 


B 2 


B 3 


Bi 


Bs 


Be 


Bt 


Bg 


Bg 


Bio 


Bn 


Bi2 


Bi3 


Bi4 


Bis 


Bio 


Bi 


- 


-95 


-53 


-19 


-102 


-69 


-79 


-108 


-151 


-214 


-182 


-192 


-159 


-246 


-259 


-284 


B 2 


-331 


- 


-132 


-119 


-195 


-177 


-197 


-163 


-192 


-230 


-254 


-284 


-261 


-291 


-313 


-356 


B 3 


-454 


-283 


- 


-122 


-196 


-179 


-171 


-234 


-264 


-293 


-236 


-319 


-317 


-324 


-315 


-333 


Bi 


-495 


-332 


-168 


- 


-176 


-159 


-190 


-185 


-248 


-259 


-327 


-321 


-330 


-421 


-334 


-374 


B 5 


-496 


-309 


-164 


-132 


- 


-125 


-198 


-151 


-266 


-244 


-285 


-250 


-259 


-346 


-364 


-338 


Be 


-555 


-286 


-169 


-126 


-180 


- 


-175 


-163 


-200 


-243 


-197 


-284 


-276 


-340 


-364 


-351 


Bt 


-531 


-358 


-198 


-140 


-149 


-138 


- 


-159 


-199 


-250 


-207 


-230 


-262 


-272 


-306 


-345 


Bs 


-513 


-323 


-196 


-159 


-160 


-140 


-174 


- 


-202 


-220 


-217 


-259 


-289 


-311 


-291 


-328 


Bg 


-538 


-338 


-201 


-156 


-138 


-145 


-153 


-150 


- 


-204 


-197 


-237 


-271 


-306 


-312 


-315 


Bio 


-526 


-336 


-190 


-190 


-156 


-134 


-149 


-151 


-131 


- 


-188 


-187 


-245 


-228 


-274 


-292 


Bn 


-528 


-321 


-210 


-161 


-169 


-136 


-147 


-131 


-156 


-190 


- 


-245 


-248 


-275 


-294 


-325 


Bi2 


-602 


-296 


-224 


-191 


-145 


-148 


-121 


-148 


-122 


-166 


-115 


- 


-205 


-234 


-262 


-287 


Bi3 


-493 


-331 


-193 


-127 


-152 


-136 


-125 


-167 


-143 


-182 


-154 


-164 


- 


-226 


-238 


-236 


Bi4 


-544 


-345 


-210 


-181 


-174 


-149 


-117 


-137 


-125 


-158 


-123 


-147 


-158 


- 


-175 


-218 


Bis 


-543 


-298 


-212 


-150 


-128 


-103 


-109 


-89 


-126 


-148 


-153 


-181 


-150 


-195 


- 


-202 


Bi6 


-583 


-329 


-234 


-151 


-144 


-129 


-108 


-83 


-115 


-134 


-147 


-158 


-129 


-161 


-173 


- 



algorithm performing 16 mutations/chromosome is better an algorithm perform- 
ing 1 mutation/chromosome than to find a test function for which an algorithm 
performing 2 mutations/chromosome is better an algorithm performing 1 muta- 
tion/chromosome. This results can be explained by the fact that Bi and Biq are 
very different whereas Bi and B 2 are very similar is more difficult to find test 
problems for which two very similar algorithms perform significantly different. 

The lowest value in Table 5 is -583 and it corresponds to pair {Biq,Bi). 
This means that finding a function for which an algorithm performing 
16 mutations/chromosome is better than an algorithm performing 1 muta- 
tion/chromosome was the easiest operation. This suggests a rugged fitness land- 
scape of the evolved test function. In order to confirm this hypothesis we have 
analyzed the landscape of the evolved test functions (30 functions obtained in 
30 runs) for the pair (Biq,Bi). Each test function was considered as having 1 
real- valued variable over the interval [0, 65535]. The average number of peaks (a 
point is considered as being a peak (local or global optimum) if its left and right 
values are higher than its value) of the evolved test functions was 23127 (out of 
65536 points). This suggests an highly rugged landscape. 

5 Conclusions and Further Work 

In this paper, a framework for evolving test functions that are matched to a 
given algorithm has been proposed. The proposed framework is intended to 
provide a practical evidence for the NFL theories. Numerical experiments have 
shown the efficacy of the proposed approach: test functions for which Random 
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Search performs better than all other considered evolutionary algorithm have 
been successfully evolved. 

Further research will be focused on the following directions: 

(i) Proving that the evolved test problems are indeed hard for the considered 
Evolutionary Algorithms. 

(ii) Comparing other evolutionary algorithms for single and multiobjective opti- 
mization. Several test functions matched to some classical algorithms (such 
as standard GAs or ES) for function optimization will be evolved. In this 
case the problem is much more difficult since the number of distinct solu- 
tions visited during the search process could be different for each algorithm. 

(in) Evolving difficult test instances for algorithms used for solving other real- 
world problems (such as TSP [2], symbolic regression [10], classification 
etc). 

(iv) Finding the set (class) of test problems for which an algorithm is better 
than the other. 

Acknowledgment. The source code for evolving test functions matched to a 
given algorithm and all the evolved functions described in this paper are available 
at www.nfl.cs.ubbcluj.ro. 
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Abstract. In this paper we analyze a simple adaptive model of compe- 
tition called the Minority Game, which is used in analyzing competitive 
phenomena in markets. The Minority Game consists of many simple 
autonomous agents, and self-organization occurs as a result of simple 
behavior rules. Up to now, the dynamics of this game have been studied 
from various angles, but so far the focus has been on the macroscopic be- 
havior of all the agents as a whole. We are interested in the mechanisms 
involved in collaborative behavior among multiple agents, so we focused 
our attention on the behavior of individual agents. In this paper, we sug- 
gest that the core elements responsible for forming self-organization are: 
(i) the rules place a good constraint each agent’s behavior, and (ii) there 
is a rule that leads to indirect coordination. Moreover, we tried to solve 
the El Farol’s bar problem based our suggestions. 



1 Introduction 

An environment consisting of many autonomous entities, where every one can 
get high profit even though they behave on selfish way, is very attractive for the 
market economy, etc. This is also very useful for coordinating agents in multi- 
agent systems. Recently the mechanisms behind the movement of a school of fish 
or formation of flying birds, where individual elements seem to be organized in 
well ordered behaviors as a whole by their intelligent coordination mechanism, 
have been studied as one class of the complex systems. And it has become clear 
that these well-ordered behaviors can be formed by only local and very simple 
interaction among elements. The Minority Game is a simulation program for 
analyzing models of adaptive competition, that consists of many autonomous 
agents, like the market. In it, many agents act with the purpose of obtaining 
their own profit by using only local information. It has been well studied in 
econophysics and looked at from various other angles since it was first proposed 
more than four years ago [1-7, 9]. Up to now, many algorithms to resolve this 
game are proposed, but in nearly all of those studies, researchers concentrated 
on the behavior of the agents as a whole; to date, we have not seen any papers on 
studies where the focus was on the behavior of the individual agents of the game. 



A.J. Ijspeert et al. (Eds.): BioADIT 2004, LNCS 3141, pp. 484—495, 2004. 
@ Springer- Verlag Berlin Heidelberg 2004 
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Therefore, we investigated the behavior of the individual agents to discover from 
what kind of individual behavior efficient overall behavior arises. Especially, we 
pay attention to the standard Minority Game proposed first [1], because in it 
well-ordered behaviors of agents can be formed by only local and very simple 
interaction among them even though they behave on selfish way. And it has 
become clear that when all agents can behave in a well ordered manner and can 
all get high profit, fractal characteristics can be seen in the behaviors of each 
agent. Finally, we tried to solve the El Farol’s bar problem, which is an extended 
version of the Minority Game, based our suggestions. 



2 The Standard Minority Game 

First, we review the rules of the standard Minority Game (see Fig. 1 (a)). We 
have N agents, each of which is an autonomous entity that independently chooses 
between two alternatives (group 0 or group 1) according to its own behavior rules. 
In each round of the game, all of the agents choose one or the other alternative, 
and the agents that then finish in the minority group are considered to be winners 
and are each awarded one point, and the total awarded points of all agents is the 
profit of this round of the game. Therefore, the smaller the difference in number 
between the majority and minority groups, the better the result. 




Next decision 

I 



0 


0 


0 


0 


0 


0 


1 


1 


0 


1 


0 


0 


0 


1 


1 


0 


1 


0 


0 


1 


1 


0 


1 


0 


1 


1 


0 


1 


1 


1 


1 


1 



(b) 



Each agent makes his selection based on strategy tables that he holds. A 
strategy table consists of a set of histories, each of which contains all combina- 
tions of m past winning-group choices and a next decision that corresponds to 
each combination (see Fig. 1 (b)). At beginning of the game, each agent ran- 
domly prepares s strategy tables, and we randomly set the m past history as 
for initialization. In the first round of the game, each agent selects one of his 



486 



S. Kurihara et al. 



holding s strategy tables randomly, and if he wins, that he can belong to the 
minority group, one point is assigned as profit to the selected strategy table; one 
point is deducted for a loss. In the second and subsequent round of the game, 
the strategy table that has the highest number of points is always selected. This 
cycle is repeatedly applied for a predetermined number of rounds, and the final 
results of the game are the total acquired points, across all rounds, of winning 
agents. 




Fig. 2. Standard deviation of the number of winning agents 



2.1 Coordinated Behaviors 

We begin by verifying the collective behavior of the agents in the same way as has 
been done in the previous studies. First, the standard deviation of the number 
of winning agents is shown in Fig. 2. The game was played for the number of 
rounds described below for various numbers of strategy tables possessed by the 
agents, s = 2, 5, 10, 16, 32, 64 and various history depths for the strategy tables, 
m = 3 - 16. For each parameter pair, s, m, the game played for 10,000 rounds in 
one trial, and each trial was done in sets of 10. Fig. 3 shows the mean value for 
the number of winning agents. In both Fig. 2 and 3, the horizontal lines represent 
the standard deviation and the mean value for when all of the agents made their 
choices randomly. These graphs show that the standard deviations became lower 
and the mean numbers of winning agents became higher than for the random 
selection case, mainly when m was 3-5. This means that a winning group ratio 
that was intentionally near 100:101 arose, which is to say that some kind of 
coordinated behavior among the agents was taking place. What is characteristic 
is that, although we expected behavior based on longer histories to be more 
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Fig. 3. Mean numbers of winning agents 



efficient, the results show that when m became larger than 10, the produced 
results that were the same as random behavior. 

Even if the standard deviation became low in m=3 - 5, a fixed number of 
agents may always become winner or loser. So, we investigated how many times 
each agent could win in several situations, and Fig. 4 shows the ranking of 
the 201 agents according to their average scores. In the case where each agent 
could select ’’group 0” or ’’group 1” randomly, all of the agents were able to get 
approximately 4750 points. In contrast, when the standard deviation was small 
(m=3 - 5), the mean score was high and, although some differences between 
agents can be seen in the scores, all of the agents were able to achieve stable 
high scores. And interestingly, as m increased, a number of loser agents became 
larger, and when m was greater than 10, the trend approached the same level 
as seen in the random selection case. So, it can be seen that a phase transition 
occurred in m=6 - 9. 

3 Key Elements to Form Self-Organization 

In the standard Minority Game, following two rules are thought to be important 
elements: (i) all agent must see the same entry of their strategy tables according 
to the winner-group history, (ii) each strategy table’s point is changed by win 
and loss. 

3.1 Winner-Group History Is Unnecessary 

To verify whether winner group history is necessary or not, we examined the fol- 
lowing simulations: In the normal algorithm, if the current winner group history 
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is ”010”, all agent see the entries of ”010” of their holding strategy tables, and 
each agent selects one of his holding strategy tables, depending on their points. 



Score (Number of wins) 




Fig. 4. Ranking of 201 agents of each situations 



At this point, we raise the following questions: 

(1) If each agent is allowed to select the entry of strategy tables randomly, can 
self-organization occur? 

(2) If only one, agent-A, is allowed to select the entry by its own rule based on 
the winning group history and the other agents use as same entry as agent- 
A, can self-organization occur? In other words, all agents do not obey the 
winner-group history, but they see the same entry of their strategy table. 

(3) If we intentionally generate a random winner group history, can self- 
organization occur? 

If self-organization can be formed in the one of above situations, our hypoth- 
esis that there is no effectiveness in the memory of the winner group history may 
be correct. Fig. 5 shows the result: the same self-organization as in the normal 
algorithm could be formed in situations (2) or (3). As in (3), even when we made 
a random winner group history, they could form a good organization. This result 
shows that the memory of the winner group history may not be important in 
forming self-organization. But when each agent could select the entry of strategy 
tables randomly, their behaviors were the same as the random selection version. 

Therefore, we can infer that an important point concerning the strategy 
tables is that, in the normal algorithm, the rule that all agents depend on the 
winner group history means placing a constraint on agents. In other words, if we 
can place a good constraint on agents like in situation (2) or (3), we may be able 
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Score (Number of wins) 




Fig. 5. Is the memory necessary? 



to use any kinds of rules. Applying a good constraint to the agents has also the 
same meaning as decreasing their freedom. In [8], we also discussed how to form 
self-organization by using other frameworks, and made the same suggestion we 
do in this study. 



3.2 Importance of the Indirect Coordination 



In the standard Minority Game, indirect interaction between agents is also 
considered to be important element in forming organization. Each agent decide 
his behavior based on the results of each round, and his decision will influences 
the behaviors of the other agents indirectly. In the normal algorithm, if the 
agent wins, one point is add to the selected strategy table, and if it loses one 
point is subtracted. For example let’s consider one agent and its two strategy 
tables: table-A and table-B. If table-A has 4 points and table-B has I point, 
the table-B is not selected until table-A has lost at least 4 times in a row. 
At this point, if we change the rule for selecting the strategy tables, will 
self-organization still be formed? Fig. 6 shows the result of our investigation of 
this question. We implemented the following rule. 

(Version I) Agents select the strategy tables sequentially. The interval of 
the exchange is randomly set up. 

(Version II) If it wins, one point is added to the selected strategy table, but if 
it loses, two points are subtracted. 

(Version III ) If the agent loses one game, the strategy table is changed even if 
the points of this strategy table is still higher than the points of the other one. 
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Score (number of wins) 




place place place place place place 



Many desperate winner and loser agents arose. 
Fig. 6. Existence of indirect coordination 



Unfortunately, self-organization was not formed in any of these versions. The 
way of selecting the strategy table must have a close relationship with the initial 
combinations of the strategy tables of each agent. Therefore, we investigated 
how these strategy tables were used in each agent with the normal algorithm. 
Fig. 7 shows the transitions of the scores for each strategy table held by the 
25th-placed agent among the 101 agents (m=3, s=2). Both strategy tables were 
used, but there was no fixed period for the continuous use of one strategy table. 
As you can see, a fractal characteristic is visible in these periods; that is, there 
was self-similarity in the use of strategy tables. Fig. 8 shows, on a log scale, the 
histogram of periods over which one strategy table was continuously used, with 
m=3, 7, 14, by all of the agents (n = 101 and 301), and in m=3, graphs were 
nearly straight lines with gradients roughly between 0.7 and 1. 

But, according to the histogram of agents in m=14, in this situation the per- 
formance of agents was only as good as random selection, and the graphs could 
not be seen like straight lines. This means that, when to= 14, the fractal charac- 
ter did not appear. Especially, the histogram of agents in m=7 was interested. 
In m=7, several graphs which were similar to the graphs of m=3, and the other 
several graphs which were similar to the graphs of m=14 arose. At this point, 
it can be thought that although the histogram of winning agents became nearly 
straight lines, the histogram of loser agents became the same as the situation in 
m=14. 

Further detailed investigation is thus necessary to clarify what are the es- 
sential mechanisms in the selection of the strategy tables, and this is a topic 
for future work, but at least we may be able to use this difference between, for 
example, m=3 and m=14 to check and infer whether agents can perform in a 
well ordered manner or not. 
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Fig. 7. Which strategy table was used? (Top) 10000 games, (Middle) 1000 games, and 
(Bottom) 100 games 



4 Can We Resolve the El Farol’s Bar Problem? 

The El Farol’s bar problem [10] is similar to the Minority Game; the only differ- 
ence is the following rule in the former case: a desired ratio between the numbers 
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Fig. 8. (Top) For m=3 (all 101 agents and 301 agents), (Middle) For m=7 (all 101 
agents and 301 agents) (Bottom) For m=14 (all 101 agents and 301 agents) 



of agents in the minority and majority groups can be defined freely (e.g. 1:3 or 
1:5). 

We tried to resolve this as follows: In the Minority Game, the strategy table is 
such that items ’0’ and ’1’ are chosen with equal probability, but in the El Farol’s 
bar problem, the probabilities match the target ratio. So, if the pre-defined ratio 
is set to 1:3, we generate the strategy tables by selecting item ’0’ once in every 
three selections of item ’1’ (see Fig. 9). But such usage of the strategy tables 
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means that the number of agents that select group 0 is almost always fewer than 
the number selecting group 1, so we only assign and deduct one point when the 
strategy table’s next decision was only group 0. The remaining rules are the 
same as the rules of the Minority Game. 



-m 



Next decision 
1 



0 


0 


0 


0 


0 


0 


1 


1 


0 


1 


0 


1 


0 


1 


1 


1 


1 


0 


0 


1 


1 


0 


1 


0 


1 


1 


0 


1 


1 


1 


1 


1 



e. g. 

Pre-defined ratio 



I 



Item 'O' : Item ' I’ 



is 1:3 
= 2:6 



Fig. 9. Strategy table for the El Farol’s bar problem 



Table 1 shows the standard deviation in the numbers of agents that won the 
game by using our algorithm and by random selection. In every situation, the 
standard deviation of our algorithm was lower than that for random selection. 
Fig. 10 shows the ranking of the 101 agents when the average scores of the pre- 
defined ratio were set as (group 0):(group 1)=1:1, 1:2, 1:3, 1:7. As you can see, 
most agents received a higher score with the algorithm. 



Table 1. Standard deviation of the nnmber of agents that selected "group 0” 



Number of agents in Group 0 : Group 1 


Our algorithm 


Random 


1:1 


0.95 


5.02 


1:2 


1.75 


4.89 


1:3 


0.61 


4.40 


1:7 


1.06 


3.38 



But, as you can see in Fig. 10, desperate winner and loser arose. So, although 
total acquired points of winning agents became higher than acquired points of 
random selection version, it cannot be considered that El Farol’s bar problem 
can be clearly solved by the current rules. 

In related work on the El Farol’s bar problem, Nakayama resolved by using 
his proposed ’’cognitive congestion model’’ [11] that based on GA technique, 
and Sato and Namatame [12] resolved by adding a new behavior rule to the 
agents that if they win this trial they act to intentionally lose the next trial of 
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the game for the sake of other agents. Although both works resolved well, the 
methodology in [11] took a long time to converge the behavior of agents, and as 
for [12], we consider that, for the agents, trying intensionally to lose their profit 
cannot be seen as the selfish behavior. Even if each agent behaves selfishly, all 
agents can get high profit as a whole: this is the most interesting point of the 
Minority Game. Therefore, we tried to resolve the El Farol’s bar problem by 
using a similar way as for the Minority Game. Further detailed investigation is 
also necessary to clarify an effective rule for selecting strategy tables in the El 
Farol’s bar problem, but that is for future work. 
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Fig. 10. Ranking of 101 agents of each situations 



5 Conclusion 

In this paper we analyzed a simple adaptive competition model called the Minor- 
ity Game which is used in the analysis of competitive phenomena in markets, and 
verified the validity of following suggestions: (i) the rules place a good constraint 
each agent’s behavior, and (ii) there is a rule that leads to indirect coordina- 
tion. As for (i), A. Gavagana [13] made basically the same suggestion as us. His 
suggestion is that, in the Minority Game, the memory of the winner group is 
irrelevant, and the important point is to share some data among agents. But 
we want to suggest that the important point is to give agents a good constraint 
on their behaviors, and to give them the constraint that they share some data 
and, in the normal algorithm, this data is designed to use the winning group his- 
tory. As for (ii), a detailed investigation will be necessary to clarify the essential 
mechanism for selecting strategy tables. Finally, we tried to solve the El Farol’s 
bar problem, and currently, agents can behave in a well ordered manner based 
on our proposed suggestion. Of course, we will continue with further analysis 
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with the aim of establishing a general algorithm that can be applied to other 
competition problems. 
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Abstract. Biologically inspired approaches to the design of general IT are pres- 
ently flourishing. Investigating the scientific and historical roots of the tendency 
will serve to prepare properly for future biomimetic work. This paper explores 
the genealogy of the contemporary biological influence on science, design and 
culture in general to determine the merits of the tendency and lessons to learn. 
It is argued that biomimetics rests on bona fide scientific and technical reasons 
for the pursuit of dynamic IT, but also on other more external factors, and that 
biomimetics should differentiate the relevant from the superficial. Furthermore 
the search for dynamic capacities of IT that mimicking adaptive processes can 
bring about is put forward as both the history and raison d’etre of biomimetics. 



1 Lifelike - a la Mode 

Biology is enjoying enormous attention from different scientific fields as well as 
culture in general these days. Examples are legion: The victorious naturalization proj- 
ect in philosophy and psychology spearheaded by cognitive science in the second half 
of the 20th century; the exploration of biological structures in the engineering of ma- 
terials or architectures [1]; a dominant trend of organismoid designs with ‘grown’ 
curves replacing straight lines to convey a slickness and efficiency not previously 
associated with life;' World Expo 2005 being promoted under the slogans “Nature’s 
Wisdom” and “Art of Life”;^ and biology’s new status as the successor of physics as 
the celebrity science which gets major funding and most headlines. 

These examples are neither historically unique nor culturally revolutionary. Life and 
nature have been fetishized before. Yet the fascination with the living has never pre- 
viously dominated with such universality and impetus, as we presently experience. So 
we might ask: What is the reason for this ubiquitous interest in life and is it a result of 
cultural and scientific progress or merely an arbitrary fluctuation soon to be forgotten 
again? 

In order to prepare properly for future biologically inspired approaches to IT design, 
this paper investigates the roots of the biological dominance by reconstructing the 



' Think of cars, sports apparel, furniture, mobile phones, watches, sunglasses etc. 
^ http://www.expo2005.or.jp/ 



A.J. Ijspeert et al. (Eds.); BioADIT 2004, LNCS 3141, pp. 496-512, 2004. 
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recent history of techno-scientific ideas. A history that can be characterized as a pur- 
suit of dynamic IT. The objective is to distill important lessons learned to provide 
good conditions for the continued effort to develop dynamic IT through biomimetic 
design by identifying the proper challenges to embark on and dead ends to avoid. 



2 Biomimetics; Definition, Characteristics, and Motivations 

The first step in this investigation of biologically inspired approaches to IT design^ is 
clarifying and qualifying the notion of biomimetics [2, 3]. ‘Biomimetics’ has been 
chosen as the best unifying notion for biologically inspired approaches to design of 
dynamic artifacts being intuitively descriptive and adequately precise. If the following 
analysis reveals a specific or even idiosyncratic notion of biomimetic design, it hope- 
fully nonetheless contributes to an increased awareness of the conceptual foundation 
for biologically inspired approaches in general and helps prevent misunderstandings 
and conceptual vacuity. 

According to Miriam Websters online dictionary biomimetics is: 

the study of the formation, structure, or function of biologically produced substances 
and materials (as enzymes or silk) and biological mechanisms and processes (as 
protein synthesis or photosynthesis) especially for the purpose of synthesizing similar 
products by artificial mechanisms which mimic natural onesf 

This definition covers two slightly different meanings of biomimetics: 

1. The artificial synthesis of naturally occurring materials, substances or other struc- 
tural configurations. 

2. Mimicking biological processes in creating life-like products. 

Both meanings concern the synthesis of specific materials or structures, i.e. the syn- 
thesis of a certain ’end product’, and they merely differ in how directly and in which 
manner the product is brought about. Biomimetics thus characterized is not an appro- 
priate label for an IT design methodology. Instead I would like to put forward a defi- 
nition of biomimetics more suitable for the approach: 

3. The mimicking of complex self-organizing natural processes to obtain dynamic 
artifacts harboring adaptive and self-maintaining capacities. 

Whereas 1) and 2) concern the creation of fait accompli products, a biomimetic IT 
design methodology 3) instead creates dynamic ‘produces’, i.e. evolutionary proc- 
esses involving IT devices that adapt in use [4]. 

This does not mean that biomimetics is ‘anti-materialistic’. On the contrary, a better 
integration of software and hardware will become an important objective for biomi- 
metics. Firstly in an effort to enhance physical objects and spaces with digital dimen- 



^ Design is a broader notion than engineering and covers all aspects of creating artifacts 
(methodological, aesthetic, sociological etc.) whereas engineering only concerns the concrete 
bringing forth of the artifact, 
http ://w ww.m-w.com/ cgi-bin/dictionary 
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sions, providing novel functions free of spatial constraints [5]; secondly to develop 
self-assembling, evolvable and re-configurable materials for a ‘deeply’ and ‘inte- 
grated’ dynamic IT adaptive at both software and hardware level; thirdly in the study 
of computation itself, with models integrating structural and topological characteris- 
tics of natural computation as it occurs at the chemical level (e.g. ’lock and key’). In 
addition integrative design becomes a necessity as increasingly miniaturized hardware 
starts having idiosyncratic characteristics due to microphysical effects. Such minis- 
cule hardware will have to be functionally coupled with software in some way and 
collective ‘growth’ seems a good remedy for heterogeneous characteristics. 



2.1 Biomimetics: Design for Dynamics 

A biomimetic design methodology capitalizing on the self-organizing capacities of 
evolutionary processes might appear to be a contradiction in terms. ‘Design’ normally 
characterizes an intentional and teleological strictly human practice whereas natural 
order arises ‘blindly’ and post hoc from variation, selection and retention cycles. 
Biomimetic design thus either means ‘un-designed design’ or erroneously project 
intentionality into adaptation. 

According to [6] a crisp distinction between the teleology of design and the cau- 
sality of evolution does not stand up to close inspection. The argument goes, correctly 
(cf. [7, 8]), that because the human brain itself basically operates by amplification of 
fluctuations (variation, selection and retention dynamics) thoughts and ideas are evo- 
lutionary selected post hoc rather than deliberately created de novo. The difference 
between cognition and other evolutionary dynamics therefore becomes merely onto- 
logically regional and not intrinsic. If our designing skills in other words are just the 
result of high level evolutionary processes there is no essential difference. 

The middle way, which I will put forward, is that there is a significant difference 
between human design and other evolutionary processes, if only in degree and not in 
kind, but that this on the other hand does not render the idea of biomimetics incoher- 
ent. Despite some terminological fuzziness and the merit of the argument of [6] with 
respect to cognition, the notion of biomimetics nonetheless adequately covers the 
specific design methodology under scrutiny here. Whatever the micro-processes un- 
derlying cognition, there is a difference between the emergent macro-process of hu- 
man deliberation and the agent-less achieving solutions merely by due means (e.g. by 
an autonomously self-organizing technology). In fact a biomimetic methodology is 
quite different from conventional design approaches, and the outcome no less differ- 
ent. The notion of design simply changes when the role of agency in designing is 
distributed and even hard to identify, as is the case for example with evolutionary 
algorithms. The standard notion of design as top-down controlled act no longer holds 
if parts of the design emerge from self-organizing processes [4]. Moreover identifying 
the ‘agency’ responsible for a specific state of affairs is pivotal for psychological and 
ethical issues related to technology and this becomes relevant with increasingly 
autonomous technology. 

The concept of biomimetics also needs qualification in a different sense. The principle 
of nature primarily mimicked by a biomimetic approach to IT design - adaptive dy- 
namics - is not exclusively biological. Much research suggests that the self- 
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organization of groups by variation, selection and retention is a universal ordering 
principle governing basic physical laws as for example crystal growth to sociological 
processes such as the proliferation of ideas [7, 8, 9, 10]. Biology is just one, albeit 
very prominent, domain of evolutionary dynamics and happened to he the field first 
described by such terms. Again this lack of terminological adequacy is not harmful if 
the notion is deployed rigorously for the specific approach characterized in this paper. 

On the basis of this terminological clarification, hiomimetics can be characterized. 
Biomimetics is a design methodology for complex artifacts, deployed to support hu- 
man design with self-organizing evolutionary mechanisms. Biomimetics is a ‘meta- 
governing’ approach in the sense that it retains human control of the overall func- 
tional norms of artifacts while exploiting evolutionary processes to provide the func- 
tions required (cf. [11]). Biomimetics provides simultaneously improvement of our 
design and in specific circumstances reduction of the labor going into it by leaving 
some parts of design to evolutionary self-organization. Biomimetics thus seeks to 
capitalize on the respective (and complementary?) strengths of evolutionary processes 
and human creative and teleological capabilities [29]. 

There is a range of reasons why a general bio-inspired tendency has arisen and most 
probably will continue to grow within IT-research. Let us take a look at some of these 
to get a better picture of the nature of biomimetic IT design.^ 

One of the main challenges facing IT design is finding the means to develop and 
maintain ever growing IT systems. IBM’s Autonomic Computing Project® is moti- 
vated by estimations that the development and maintenance of future IT systems will 
be impossible without new ways of designing such systems. IT needs to take care of 
itself, and living systems provide so far the only examples of just this capacity. Life 
has developed means to evolve, develop and learn by adaptive dynamics and since we 
have got sciences concerned with the organization of adaptive systems - primarily 
biology but also younger transdisciplinary fields such as dynamic and complex sys- 
tems theory - it is instructive to consult models and theories from these fields when 
developing future IT. 

Second, our cognitive capabilities are evolutionarily constrained and we simply 
cannot fully overview, let alone control, very complex structures or processes. History 
is filled with examples of how technologies turned out differently than expected and 
dispatching itself form our control, and we have no reason to believe that this is about 
to change.’ Who could predict that the surfing behavior resulting form the introduc- 
tion of the remote control would change the very content of TV broadcasting [5]; that 
SMS-organized mobs would bring about social change because of a simple feature on 
mobile phones [12]. Faced with highly non-linearly dynamic and complex systems 
our cognitive capacities are simply inadequate and leave us without a chance when 
trying to analyze the long term and global consequences increasingly important with 
growing systems. Add to this fluctuating user practices increasing proportionally to 
the freedom technology provides. Acknowledging our limited powers we should 



® For supporting or additional reasons to apply biomimetics see [1, 2, 3, 4, 9, 11, 13] 

® http://www.research.ibm.com/autonomic/ 

’ The uncontrollable nature of technology is the very basis for the argument of ‘technology 
determinism’ popular among luddites and variants of philosophies of technology. 
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abandon the notion of a fully controlled top down design process and join a fruitful 
alliance with some of natnre’s benevolent ordering principles. 

Third, modern theories of complex adaptive systems have gained valuable knowledge 
on dynamic processes and offer models for the local behavior of constituents as well 
as global characteristics of systems. This conceptual toolbox will prove immensely 
important in designing IT systems as nested and hierarchical complexes of systems 
interacting on multiple levels. Further, the theories facilitate scaling to avoid disinte- 
grated ‘stratified’ views.* The quality of future technology depends on a better general 
understanding of interacting systems on different scales, from device-device to whole 
networking societies. IT has to be designed in ways that accommodate rapidly 
changing practices, mobile, long-distance and trans-media corporations and other 
forms of (unforeseeable) changes of conditions. From the design of individual devices 
to the general organization of IT systems architectures and protocols must be mutu- 
ally supportive to carry biomimetics to its full strengths [2, 3]. 



3 Genealogy of 20th Century Bio-centrism 

The present interest in life is not historically nnique, bnt seems to occur periodically. 
In relation to biologically inspired design of IT, technology has always been modeled 
after as well as been model for the dominant conception of life. This dialectics stem 
from our desire to understand and master nature. To (re-) produce is to comprehend - 
verum et factum convertuntur - has been the credo through scientific history. Flence 
by recreating life we might hope to get behind the veil of nature’s mystery and peek 
into God’s workshop. The only variations in this perennial dream have been changing 
epoqnes metaphysical conceptions of life. For example the ancient sculptures created 
in dirt thought to be one of the four basic elements, Flellenic hydranlic automata mod- 
eling Aristotelian ‘motivation’ and ‘movement’ (‘movere’ is the etymological root of 
both motion, emotions and motivation), the intricate mechanical animals and chess 
players with the dawning mechanistic natural science, steam driven machines of the 
19th century thermodynamics, the postwar computational robots and self-organizing 
ALife at the turn of the millenninm. 

The bio-techno pivot is completed by the fact that the technological reproduction of 
natnre is fneled by mans perpetual religious and pragmatic awe for the ingenuity of 
nature’s ‘design’. This awe is so firm that it has been difficult to convince people 
(many are still not convinced!) that the ‘design’ of nature in fact emerges spontane- 
ously by self-organizing processes without any teleological agency. 

However scientific developments since the late 19th century paved the way for a 
hitherto unparalleled blossoming of our fascination with the living and not least the 
efforts to make good use of our insight into its governing principles. Dnring the last 
century science became increasingly preoccupied with systems, complexity, dynamics 
and information. Phenomena that are all notoriously manifest in organisms and thns 
biology naturally took the center of the scientific stage. At the same time pollution 
entered the stage and sympathy for natnre rose. After a following period of dichoto- 



As instrumental as the ‘software - hardware’ distinction is descriptively, it might turn out to 
be a methodological hindrance for future IT design if taken ontologically. 
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mization of nature and technology, scientific insights into the wonders of complex 
systems proliferated. From being pictured as slow, vulgar, dirty and beastly nature 
suddenly became appreciated for speed, immunity, efficiency and economy. Capaci- 
ties normally identified with hi-tech but now borrowed from nature. 

In the following sections I will provide a brief reconstructions of the main scientific 
and technological tendencies of the 20th century that led to the present interest for 
living systems. The reconstruction is divided into historical themes, which should not 
be taken rigidly. Historical overviews are by nature selective and begin in media res. 
Besides 1 have no wish to undertake a comprehensive review of recent scientific his- 
tory. To simplify and clarify matters cybernetics will be our protagonist; both as our 
point of departure and as the undercurrent of the genealogy of biomimetics. 



3.1 Cybernetics 

Like all other cultural manifestations, cybernetics did not arise in a vacuum but grew 
from scientific development embedded in a historical context [14]. Important scien- 
tific discoveries and accumulated knowledge fertilized the ground for the converging 
ideas for research in missile-guiding systems, animal behavior, sociology, neurology 
and computation around WWII. Cybernetics was founded by a more or less coherent 
movement of various leading scientists and engineers who met from 1946 to 1953 at 
the so-called Macy conferences (after the Josiah Macy Jr. foundation, which funded 
the meetings). Moved by discoveries made during the previous decades within 
mathematics, physics, biology and chemistry the participants set out to explore proc- 
esses in complex systems. In fact the very idioms ‘complexity’ and ‘system’, together 
with ‘feedback’, ‘circular causality’ and ‘information’, were created by the cyberneti- 
cians in their pursuit of fundamental models for dynamic systems. One of the primary 
motivations - at least among central figures such as Norbert Wiener, Arturo Rosen- 
blueth and Warren McCulloch - was the similarity between certain mechanical and 
biological processes. In particular, animal purpose guided behavior, or ‘teleology of 
organisms’ as they put it themselves, became the model for self-adjusting machines. 
Purposeful behavior seemed basically to consist in adjusting behavior by recursively 
computing the difference between present state and the reference state. This simple 
feedback-cum-computation model of an almost supernatural phenomenon as teleol- 
ogy promised further solutions of hard philosophical riddles of the mind. 

For early cybernetics the computation taken to be the substrate of teleology was, 
unlike its successors classical AI (GOFAI) and cognitive science, conceived of as 
strictly mechanical [15]. The important difference is that the computational paradigm 
for intelligence and semantics represented by GOFAI and cognitive science took 
computation to consist of rule-guided manipulation of symbolic entities already en- 
dowed with meaning or gaining meaning by the syntactical operations themselves. 
Cybernetics did not operate with such ‘semantic computation’. Intentionality and 
semantics was instead taken to be, at best, higher-level phenomena arising from com- 
putational processes. In contrast GOFAI and cognitivism seemed to sneak in seman- 
tics through the backdoor via syntactical slight of hand by projecting higher-level 
characteristics into semantic, intentional or normative building blocks with the gene- 
sis indefinitely far back in evolutionary history. The cybernetic idea, which is being 
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echoed today in modern cognitive science, is that if seemingly irreducible phenomena 
relating to the mind cannot be explained (either as real phenomena or ‘folk psychol- 
ogy’) as the result of processes deprived of such qualities, the same phenomena can- 
not logically be explained by evolution. Then they must somehow be put into each 
organism (by God or miracle). A static understanding of mental capacities dictates 
that either they have always existed beside the physical system (a scientifically un- 
satisfying dualism) or they are just illusions (phenomenologically inadequate). So like 
in Hegelian dialectics cybernetics gives the modality of such elusive phenomena the 
position between pure being and non-being namely becoming. Intelligence is ex- 
plained as a process capacity [16, 17, 18]. 

Worth noting is the cybernetic intuition of what we today call emergence of global 
systematic capacities (e.g. purpose guided behavior, adaptivity and self-maintenance). 
Through a full-blooded and honest adherence to a mechanical conception of algo- 
rithms early cybernetics acknowledged the need for a dynamic conception of the 
mind.^ Properties arising from complexity such as non-linear dynamics and self- 
organization formed the core of this emergentist explanation of purposeful behavior 
arising from mechanical processes. Mechanical in the sense of formally describable 
processes giving rise to self-regulating behavior without any entelechy or teleology in 
the standard sense. Cybernetics thus placed itself between the reductionism of tradi- 
tional physicalism and the transcendentalism of (some) philosophical approaches, by 
stressing the scientific importance of mathematical models while making room for the 
autonomy of emergent levels of description. This turn represented an early version of 
a slowly propagating undercurrent of science towards interest in processes and dy- 
namics instead of traditional predominantly atomistic and structural scientific models. 
In more grandiose terms cybernetics manifested a general movement from “substance 
metaphysics” to “process metaphysics” of 20th century science [16]. 

Over the years cybernetics began stressing the contribution of the system itself in 
processing input to behavior. This ‘second wave’ of cybernetics reached its zenith 
with [19], which argued for the ‘constructivism’ of systems orchestrating their inner 
organization in response to interactions with their environment. Whereas early cyber- 
netics sometimes resembled behaviorism, its dominant precursor, by the early 1970s it 
had reached the opposite pole with theories of self-organization and ‘autopoiesis’ of 
Humberto Maturana and Francisco Varela. These theories opened up the ‘black box’ 
of cognition to an extent that seemed to occlude everything external. From being a 
general philosophy of complex systems cybernetics had come to stress epistemology. 
Yet, in the true spirit of cybernetics such self-organizational epistemological charac- 
teristics was still viewed as integrated aspects of the overall self-maintenance of 
adaptive systems [20]. 



3.2 The Ratio Club 

On the other side of the Atlantic, early cybernetics had a stepsister, heavily inter- 
weaved with the Americans, but prominent enough to deserve separate treatment here. 



^ It must be noted that the cyberneticians did not form a homogenous group advancing a single 
coherent theory. For differences in conceptions of ‘mechanics’ see [15]. 




The Genealogy of Biomimetics: Half a Century’s Quest for Dynamic IT 



503 



In London a group including Alan Turing, W. Grey Walter and Ross Ashby, named 
The Ratio Club, met to discuss scientific and engineering questions also partly moti- 
vated by work on war machinery. In their interaction with the American cybernetics. 
The Ratio Club members came to play an important role as founders of the second 
wave cybernetic and the increased focus on self-organization. 

Although Ross Ashby only participated in a single Macy meeting (the 9th) he was 
very influential on cybernetics. At his only appearance Ashby presented his ideas on 
the ‘homeostat’. The homeostat is an (abstract) adaptive automata, which keeps es- 
sential parameters at equilibrium by interaction with its surroundings while letting 
other parameters fluctuate when required. The homeostat provided a formalization of 
the self-maintaining principles of complex adaptive systems, as manifested by life; 
the dynamic balancing between ‘freezing’ into total order and disintegrating into pure 
chaos. The homeostat depicted complex adaptive systems as inherently “poised at the 
edge of chaos” as it was later phrased by one of Ashby’s famous pupils Stuart Kauf- 
mann. Ashby was thus responsible for placing complexity and self-organization at the 
heart of principia cybernetica and its subsequent metamorphism into its second wave. 
The neurologist and engineering genius Grey Walters is an excellent example of the 
spirit of the time and the renaissance-like feats it fostered. Walter conducted pioneer- 
ing research in neurology (including inventing the EEC) and his work on defense 
systems resulted in the radar-display still common in marine and aviation control 
today. But his explicitly biomimetic work on the two autonomous robot ‘tortoises’, 
Elmer and Elsie stands out as the most visionary. On a purely mechanical architecture 
the tortoises were designed with bio-analogue feedback guided ‘needs’, which gave 
rise to seemingly ‘motivated’ and spontaneous behavior. Endowed with simple photo 
and tactile sensors they had photo-tactic capabilities enabling them to locate their 
lightened hut when ‘hungry’, i.e. for their batteries to be recharged and to leave it 
again when batteries where charged and the light, due to a switching mechanism, 
became aversive. Walter’s work remains an astounding study in ingenious robotics 
and a milestone for biomimetic history as an early example of the prospects for im- 
plementing even idealized biological principles. It is worth quoting Walter J. Eree- 
man’s praise of Walter’s work at length: 

The significance of Walter's achievements can be understood by recognizing that 
these complex adaptive behaviors came not from a large number of parts, but from a 
small number ingeniously interconnected. These devices were autodidacts. They 
learned by trial and error from their own actions and mistakes. They remembered 
without internal images and representations. They judged without numbers, and rec- 
ognized objects without templates. They were the first free-ranging, autonomous 
robots capable of exploring their limited worlds. They are still the best of breed.. .[]... 
His devices were the forerunners of currently emerging machines that are governed 
by nonlinear dynamics, and that rely on controlled instability, noise, and chaos to 
achieve continually updated adaptation to ever-changing and unpredictable worlds. 
He can well be said to have been the Godfather of truly intelligent machines [21]. 
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3.3 Evolutionary Computing 

If cybernetics soon became neglected by its offspring, computer science, GOFAI and 
cognitive science, its current resurrection was nevertheless prepared by developments 
within the family itself. In 1960s and 1970s the computer scientist John Holland de- 
veloped Genetic Algorithms (GA) as a general way of creating software solutions by 
evolutionary adaptive processes. Sidestepping questions of intentionality, the mind 
and other philosophical issues, Holland showed how algorithms mimicking evolu- 
tionary processes were very potent in searching, problem solving and coping with 
uncertainty and change. In contrast to other post-cybernetic biologically inspired 
approaches to computational engineering, Holland was not interested in optimization 
or solutions to specific engineering problems per se. He wanted to model adaptation 
formally leading to a general understanding [22] . The GA model Holland presented in 
[23] was algorithms organized in population of competing chromosomes coding for 
different solutions to a given problem. Chromosomes coding for successful solutions 
where reproduced by the exchange of genetic material (via crossover) and random 
mutation. Due to the speed of computers evolution of solutions over many generations 
allowed for fast and reliable almost automated software programming. 

What Holland’s work made clear was that faced with unknown tasks, changing con- 
ditions or other uncertainties, populations of candidates undergoing heritable variance 
and a good selection heuristics is a potent strategy. Holland’s work demonstrated how 
nature’s principles for problem solving were - at least in some domains - reproducible 
and generally applicable. When exposed to the prisoners dilemma, the traveling 
salesman or other non-trivial computational tests, GA’s proved to be reliable and 
remarkably fast in finding solid solutions and ‘rational’ strategies. The results were 
very convincing and seemed to provide a powerful tool for dynamic automated prob- 
lem solution. So although of great theoretical strength, it was the pragmatic value of 
GA’s that paved the way to the prominent status that evolutionary techniques enjoy 
today. Engineering focused computer scientists, normally not interested in other 
fields, suddenly realized the value of theoretical cross-fertilization. GA’s bestowed 
genuine adaptive dynamics upon standard architectures and self-organizing technol- 
ogy took a significant step forward. 



3.4 Neural Networks 

Parallel and more architecturally focused developments within computer science were 
to place cybernetics on the agenda more directly. The rise of neural network theory, 
or connectionism as it was soon named, in the 1980s was a reemergence of cybernetic 
ideas. In their seminal paper from 1943 McCulloch and Walter Pitts described the 
brain as a network of simple neuronal units each firing according to the net-sum of 
inhibitory and excitatory inputs and the firing potential of the neuron [24]. Many 
heavily interconnected neurons facilitate interesting higher-level computation by 
simply firing or not due to the non-linearity of their collective behavior. Such digital 
dynamics resembled bivalent logic, thought to be the essence of reasoning, and the 
analogy to human thinking was clear. Their ideas soon gave birth to a new architec- 
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ture for computation called artificial neuronal networks. However, for various techni- 
cal and historic reasons, the von Neumann architecture remained dominant. 

In the early eighties the connectionist approach took a leap forward aided by im- 
proved hardware and increased scientific attention. The results were promising 
enough to attract attention from mainstream computer science and AI. The primary 
attraction of neural networks laid in their capacity to model learning and adaptation, 
which GOFAl was not able to provide. In addition, with the renewed interest in biol- 
ogy the model gained favor by its greater biological correspondence. So although 
neural networks were highly idealized and only partly flexible, as they must be 
trained anew for every new task, they still had a biological flavor long missing. 

Neural nets remain burdened by architectural hurdles today as such networks take 
vast amounts of interconnections to be of practical interest. So far neural networks are 
mostly simulated on conventional platforms. Yet, the jury is still out as to whether 
neural nets hold the key to future dynamic IT. Given the overall qualities of neural 
nets demonstrated thus far, it is worth developing methods for creating large-scale 
neural nets with complex architectures. This has the potential of providing a serious 
alternative to conventional computers. Again software-hardware integration seems to 
take center stage, because neural net architectures may provide the means to revolu- 
tionize this aspect of computing. 



3.5 New AI and Evolutionary Robotics 

In response to very poor result within GOFAl, especially if compared with the self- 
confidence displayed at the outset, roboticists started suggesting new ways of con- 
ceiving intelligence in the late 1980s [25]. In the place of symbolic computation, low- 
level motor capacities were put forward as the basis of cognition. AI and robotics 
became heavily biologically inspired and turned their interest from human level rea- 
soning and language to simple animals and their embodied negotiation of the envi- 
ronment. Intelligence was no longer taken as an isolated capacity by a discrete system 
but a descriptive term for the interactions between an autonomous system and its 
environment. Biological notions such as ‘development, emergence and functional 
coupling became in favor in New AI and robotics. Thus roboticists started imple- 
menting ideas from evolutionary computing to develop control mechanisms, and even 
morphology and physiology, for both virtual and physical agents. From being mar- 
ginal ideas, notions of decentralization, bottom up organization and not least em- 
bodiment by the mid nineties had become dominant concepts in robotics, New AI and 
cognitive science and buzzwords within most other academic disciplines involving 
cognition. 



3.6 Artificial Life 

Simultaneously with the (re-) emergent focus on embodiment and interactive proc- 
esses in New AI, cognitive science and robotics another adjacent field was forming. 
Building on the theoretical foundations of molecular biology and computer science 
researchers started studying (some hoped to create) life in silica or Artificial Life 
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(ALife) as Chistopher Langton baptized the field in 1987 [26]. The marriage of mo- 
lecular biology and computer science was straight forward due to an underlying func- 
tionalism a la GOFAI, regarding life as consisting in computational processes on 
information stored in digital DNA. Thus whether the substrate of the life investigated 
was carbon or silicon was a somewhat irrelevant empirical matter, at least for so 
called ‘strong ALife’. By not arbitrarily confining focus to the carbon-based systems 
we happen to know, ALife could contribute substantially to a general study of life - 
“life as it could be’’ [26]. 

Even if the metaphysics of this self-claimed pioneer field represents the zenith of a 
reductionistic computationalism (as represented in physics by Steven Wolfram’s 
radical algorithmic theory of the universe), a lot of valuable work relating to evolu- 
tionary capacities of software has been done. With its refusal to limit the scope to 
things we are familiar with, ALife provides inspiration for biomimetics also some- 
times on the verge of science fiction. 

3.7 Swarm Intelligence 

In close relation to the work within New AI, robotics and ALife, emergentist models 
grew from ethology and biology as well. By studying the heavily collaborative proc- 
esses of social insects such as bees, wasps, ants, and termites, valuable knowledge 
about the rise of productive global functions of swarms of individuals was gained. 
Similarly to neuronal networks, swarms of insects carrying out relatively simple tasks 
proved capable of rather complex feats. By exploiting strikingly simple organizational 
methods, social insects were shown to behave as a unified intelligent super-organism. 
For example ants capable of foraging with mathematically optimal distribution and 
finding shortest paths to food sources or termites practicing advanced agriculture and 
building architecturally impressive nests [13]. 

Social insects widely use indirect communication in their grand collective labor. 
Stigmergy (from the Greek ‘stigma’ = sting and ‘ergon’ = work) is a good example of 
indirect communication by (re-) configurating of the environment, which evokes a 
specific subsequent behavior in an animal. Stigmergy refers to a triggering effect 
when e.g. a hole in a wall evokes an ant to put in the missing pellet of dirt. In this way 
the organization of building is distributed structurally into the environment and arises 
self-organizationally ad hoc. 

An example of stigmergy is the chemical organization by pheromone trails. By leav- 
ing trails of evaporating pheromones ants have a dynamic communication system 
allowing for efficient organization. The principle is very simple, just as reliable and 
consists in pure ‘mechanics’. The trail used by the ant first returning from foraging is 
likely to have the most powerful scent because of the overlaying of the outgoing and 
returning trails. Through the chemo-tactic navigation of other ants following the trail, 
it becomes incrementally enhanced. Soon all alternatives - the longer routes - are 
excluded leaving only one short ‘highway’. Such a reliable, flexible and cost saving 
way of communicating is very instructive for the design of embedded IT systems. 
Research in swarm phenomena (e.g. as presented by [13], which is specifically fo- 
cused on implementation) provides interesting new ways of organizing complex tech- 
nological systems by letting the order rise bottom-up from the units themselves. What 
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is particularly interesting about swarm phenomena is the possibility of getting corpo- 
rative behavior relatively cheap and with simple individual constituents. The advance 
of swarm organization stems from the fact that the difficulty in designing systems is 
growing exponentially with infrastructural complexity. Deploying a number of simple 
devices for the same task remedies this by decreasing complexity immensely. Besides 
swarms (like networks) malfunction much more gracefully because of parallelism and 
distribution and are generally more robust. Since centralization is getting decreasingly 
opportune, let alone possible, reliable ways of facilitating global functions is impera- 
tive. Swarming seems a promising method. 

Like neuronal networks and evolutionary computing swarm intelligence is already 
widely in use. For example in network switches the model of pheromone trails has 
been mimicked with great success to handle massive information distribution by op- 
timizing bandwidth usage and preventing bottlenecks. 



3.8 Biologically Inspired IT Systems: Autonomic Computing 

Several large scale initiatives of systematic applied biological inspiration have been 
launched the last couple of years from huge players on the commercial IT field. The 
Autonomic Computing project from IBM is a good example of how biologically 
inspired approaches to IT design are starting to dominate broadly in IT design [II].‘° 
The project addresses issues related to ever growing IT systems and the urgent need 
for creating self-maintaining and self-organizing systems. The goal is to create IT 
systems that calmly and autonomously take care of maintaining themselves and pro- 
viding assistance without detailed specifications of all subroutines and solutions. Just 
like the autonomic systems of higher organisms works in the background leaving 
more mental energy to interesting and creative tasks, autonomic computing is an 
initiative to make the time spend with IT meaningful. 

The architecture suggested consist of multiple semi-autonomous devices adapting to 
changing circumstances and needs by following individual (high-level) objectives 
provided by the programmers. Thus optimized functionality and infrastructure 
emerges (evolves and develop) by the interaction between users and the systems and 
among the devices themselves. Though the Autonomic Computing project mainly 
regards infrastructure issues such evolutionary dynamics are equally important for 
providing improved assistance at the interface level [2, 3, 4]. 



4 The Viability of Biomimetics 

So far the genealogy of biomimetics reconstructed seems a glorious march toward 
total victory, but let us pause before this happy ending sinks in too deeply. First of all, 
the picture appears optimistic because the previous account focused on the genealogy 
of contemporary biomimetics and deliberately left out most conflicting nuances. Sec- 
ond, because there is no ‘end’ to history, but only continuous flux, history will un- 



http://www.research.ibm.com/autonomic/ 
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doubtedly move on after a biomimetic heyday. So let us examine the scientific foun- 
dation of biomimetics in order to equip it for the productive years to come. 



4.1 Biomimetic Considerations: Constraints and Freedom 

Generally biomimetics should remain pragmatic and focused with the due self- 
constrain of a methodology. Naturalism within philosophy and psychology, (the neo- 
romantic) environmentalism and other tendencies which biomimetic has bloomed 
together with are by nature ideological. But even if biomimetics does ride on an 
ideological wave, it is itself only transiently normative as a method to obtain better 
technology. So ideology should not be its fuel. If biomimetics builds on the slippery 
foundation of a trend it will most likely vanish together with the trend. Hype, however 
nice when one is the object of it, must be strictly avoided. 

Another concern is the argumentum ad veracundiam fallacy; referring to an improper 
authority. Biology owes a lot of the current attention to the fact that genetics not only 
has become a hot scientific topic but gained widespread cultural interest as ‘the secret 
code of life’ . The resulting ‘gene chauvinism’ that has dominated most biology the 
last fifty years, i.e. the intense focus on DNA as the structural blueprint of all life, 
provided a lot of spotlight - but often for the wrong reasons. The notion of DNA being 
a blueprint or program for the ontogenesis of the organism, as expressed by daily 
stories in the news about ‘scientist who have isolated the gene for X and Y’, has 
turned out to be overly simplistic [27]. Development is far more complex and non- 
linearly entangled with the actual environment of the organism. Most developmental 
biologists are turning towards a system-process approach regarding the functions of 
genes where genes are not the “selfish” agent of development but merely one, albeit 
important, resource for the self-organizing system [28]. 

Biomimetics must avoid falling prey for the gene chauvinistic folk biology. The con- 
cern is to get seduced into wedlock with the digital architecture by the mutual reso- 
nance of molecular biology and computational theory. Even if basing design ideas on 
conventional digital architectures is necessary as a pragmatic beginning, biomimetics 
must be careful not to get theoretically tangled up with such linear and/or atomistic 
approaches. By sticking to an outdated genetic view and merely applying convenient 
but shallow analogies biomimetics risks getting cut off from alternative paths" to new 
IT. Biomimetics should rest on qualified insight into biology if it nurtures ambitions 
beyond the metaphorical buzz. 

On the other hand biomimetics should not be blindly committed to biological fidelity 
[29]. First of all because of the unresolved status of fundamental issues within biology 
itself. To avoid getting sucked into a black hole of biological debate biomimetics 
needs to practice a cautious pragmatism regarding its biological foundations. Sec- 
ondly as a design methodology it is committed to take full and creative advantage of 
the freedom from natural constraints. Biological evolution is heavily path dependant, 
opportunistically tinkering and myopically seeking merely local optima in the fitness 



" Cultural evolution is path dependent - as is its biological counterpart - but by contributing to 
new conceptual ‘scaffoldings’ we can influence cultural change and enhance the creativity of 
future IT design. 
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landscape. Besides biological evolution is mostly slow compared to other types of 
developments (e.g. cultural and technological). Hence evolution might be the most 
efficient general adaptive strategy but it can be improved in a range of specific cases 
by intentional guidance. Biomimetics should strategically capitalize on the most pow- 
erful aspects of biological processes and human design respectively. 



4.2 Dynamic IT 

I hope to have made plausible the general popularity of biology as a consequence of a 
general scientific movement towards theories of systems, complexity and processes. 
This development not only describes the historical genesis of biomimetics but - so I 
will claim - its raison d’etre. The quest for biomimetics, like its predecessors, is em- 
barking on the challenge of creating dynamic IT. This challenge brings about a lot of 
changes some of which go to the bottom of our conventional understanding of design. 
In specific we will need to address a lot of global and long-term issues inherent in 
dynamic complex systems. And in spite of the temptation to generalize from a spe- 
cific success instances biomimetics must keep in mind that ‘solutions’ in nature only 
seem finished in a limited perspective. Only the meta-solution of adaptive dynamics is 
universal. Even though copying structural and material configurations will become 
increasingly important, it will be for their dynamic capacities and not because of 
solidity, flexibility or other physical characteristics. The intrinsic quality and power of 
living processes lies in their dynamic capacities. Adaptive processes are continuous 
‘negotiations’ and cannot be conceptualized as solutions. Copying a specific design 
and implement it in a different setting risks missing the point unless the pragmatic 
value is clear. What should be the cardinal virtue of biomimetic design is translating 
the self-organizing capacities of natural evolutionary dynamics into design to facili- 
tate ongoing adaptation and self-maintenance in IT devices [2, 3, 4, 29]. 

4.2.1 Dynamic Remedies and Dynamic Maladies 

In general, biomimetics will address new types of questions arising with pervasive 
dynamic systems. The characteristics that give such systems tremendously powerful 
and interesting functionalities also bring along new types of problems: Dynamic sys- 
tems are vulnerable to dynamic failures. To reverse a famous quote from Martin Hei- 
degger’s writing on technology: ‘But where the saving power is, grows danger also’. 
So in the euphoria of creating new types of technology, biomimetic designers must 
not forget to consider the long term and large scale consequences of such dynamic 
architectures [29, 30]. 

In general resilience, oscillation and propagation phenomena will be important issues 
for the design of dynamic systems. On the positive side to create mutually supportive 
and robust systems. On the negative to avoid destructive oscillatory or cascading 
effects. From cybernetics we have learned the importance of dampening feedback 
functions to avoid chaotic dynamics, and there will be a range of other short- and 
long-term dynamic phenomena to consider. Dynamic systems are intrinsically path 
dependent and historic and accordingly biomimetic design will have a strong temporal 
dimension new to most conventional IT design. 
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In relation to a general study of resilience and robustness in IT a way of designing 
‘immune systems’ dynamically fighting malicious code will be central. Writing in the 
aftermath of another massive blackout in the US a focus on epidemic effects of large 
scale and massively interconnected IT systems seems imperative. If we succeed in 
creating immune systems for IT new issues will emerge. Such immune systems might 
globally malfunction and give us computer AIDS or even autoimmune defects. 



5 Closing Remarks 

The living nature is in vogue these years and naturally state of the art technology 
design is influenced hy the trend. However trends come and go as fleeting perspec- 
tives on the world and they do not provide suitable foundations for scientific theories. 
For biomimetics to stand its best chance of contrihuting significantly to future IT 
design it must have a clear understanding of its premises and goals. This paper has 
tried to prepare the ground and provide some of the stones for a better foundation. 

I have pointed to the general scientific shift during the 20th century, first manifested 
by cybernetics and later disseminating to more fields, towards interest in complexity, 
organization and processes as the main reason for the massive interest for living sys- 
tems in science, technology and design. 

Many factors contributed to this development, the relevant of which this paper has 
identified. Some of the circumstances that lead to the rise of biomimetics, such as 
gene chauvinism and environmentalism ought not form basis for a future biomimetic 
design of IT if the approach is to be more than a historical curiosity. However the 
scientific reasons for the development towards interests in the organization and dy- 
namics of complex systems do offer valuable guidance for the design challenges 
ahead. Thus factors stemming from these two different sources should be identified 
and kept separate in order to avoid a lot of futile lip service. 

I have argued that, in analogy with its genealogy, the proper focus for biomimetic IT 
design is matters of dynamics in complex systems. Mimicking finished designs of 
nature might indeed be productive for some tasks, but it should not be the focus for 
biomimetics. The challenge of designing highly dynamic IT calls for models of adap- 
tive self-organizing systems capable of managing on the fly rather than fixed solu- 
tions however ingenious. A dynamic approach does not only remedy our limited ca- 
pacities for predicting future needs and behaviors in complex systems, but is the most 
adequate response to an inherently fluctuating reality. 

Biomimetics is not likely to become, or even if so to remain, the dominant approach 
to IT design. It is, after all, part of a trend and trends inherently change. However a 
general dynamic approach to design is likely to dominate more permanently as we 
learn to master self-assembling, self-organizing, and reconfigurable structures. Bio- 
mimetics might fade with scientific progress and the likely unveiling of more univer- 
sal characteristics ‘behind’ living processes, leaving biology an arbitrary realm of 
reality to model. Until then our insights into the self-organizing processes of nature 
nonetheless offer invaluable heuristics for designing dynamic IT. 
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